Everyone wants a piece of Google's success. As unstructured and semi-structured data grow to comprise more than 80% of corporate information, all types of organizations in virtually every industry are trying to figure out how to exploit Web 2.0 and apply so-called 'Cloud Computing' models to monetize enormous volumes of Web data and user interactions. Indeed, why let Google have all the fun (and profits)?
In the book Wikinomics, Authors Don Tapscott and Anthony Williams make the case that new communications and collaborative Web technologies are democratizing value creation. Specifically, the authors posit that increasingly, organizations will open up their IP to catalyze innovation by thousands or even millions of collaborators unleashing unprecedented value. This realization combined with the confluence of massive data explosions, the emergence of Google-like business models and a flood of inexpensive software technology are presenting huge opportunities for organizations around the world.
The question is, what are the requirements and characteristics of storage that will support this opportunity and allow companies to compete by putting in place cost effective, massive Web-scale (versus enterprise scale) infrastructure to support these business models? Here are some of the more obvious requirements and differences from today's enterprise storage:
- Petabyte versus terabyte scale,
- Millions versus thousands of users,
- Thousands of network nodes versus hundreds of servers,
- Highly distributed versus command and control,
- Self-healing versus backup-and-restore,
- Auto-classified versus admin-classified data,
- Intelligence in the data versus intelligence in the application,
- Semantic building blocks versus discreet files and records,
- Cheaper than dirt and simple to operate versus really expensive and complex to manage.
The predominant use case today for this type of storage infrastructure is the Google File System where globally distributed data are stored, indexed, searched and rapidly retrieved by one billion Internet users, supported predominently by advertising. Other examples include Facebook and Wikipedia, which provide a platform for users to generate their own content, creating massive repositories of information.
However these are only three examples. More mature models like eBay are evolving, and new models are emerging within the telco and managed service provider spaces, as well as other online businesses, where users are enticed with free services or content and then offered incremental value for pay. By introducing transactionality into the equation, these businesses are blurring the lines between structured and unstructured data, creating incremental demands on resiliency, state and performance.
In addition, organizations are researching so-called Web 3.0 models where relatively small applications interact with each other and access data that resides on the Web or in a 'cloud.' These applications tend to be highly customizable, very fast and available on a variety of mobile devices. As well, they contain richer media and observers can expect bandwidth requirements to continue to escalate with orders-of-magnitude more bandwidth than today's so-called Web 2.0 applications. Importantly, Web 3.0 applications are expected to leverage quasi-artificial intelligence in a fashion that involves human interaction and collaborative filtering (e.g. Flickr, Digg and collaborative search engines) to enable mashups of seemingly disparate content placed in a user context. To get a sense of what these storage capabilities will look like check out David Floyer's Peer Incite Web 3.0 clustered storage networks.'
Clearly not all organizations will build out the storage infrastructure to support these applications themselves, and many will source these capabilities from service providers. What is clear that the vast majority of organizations, large and small, will participate in some way in this emerging marketplace.
Action Item: Organizations must begin to document the market requirements for evolving and emerging Web businesses and understand the parameters, constraints and opportunities presented by them. Storage infrastructure product requirements should evolve from this thinking supported by a new class of Web scale storage products from both established players and new entrants.
Footnotes: