Storage Peer Incite: Notes from Wikibon’s June 3, 2008 Research Meeting
When Google first introduced Gmail and the then unique storage architecture underpinning it, it caused a sensation in the industry. CIOs and technologically sophisticated CEOs began to ask why they were paying so much for their storage when Google could have seemingly limitless space for so small an investment. Since then, increasing numbers of consumers have become comfortable storing and sharing digital photos and movies on services like Picassa and backing up their computers to services offered by Google and competitors charging pennies per GByte monthly. Sole-proprietor and very small businesses are turning to these and similar services to rent digital storage just as they might rent storage space on the ground.
Recognizing this trend, the Wikibon community this week took a hard look at the state-of-the-art cheap, online storage. Is it ready for prime time? What kinds of data could companies move onto such systems to save operating costs? Should companies consider using these services at all, or are they best left to consumers? Will a new generation of knowledge workers expect their data to be on the Web, accessible to them wherever they are, through whatever computer they happen to be using? Our conclusions to these and similar questions are summarized in the articles below. G. Berton Latamore
The industry is abuzz with the concept of cloud computing, which is being positioned as a next generation computing model that allows organizations small to large to tap into a network of compute power, storage and bandwidth, and ‘dial up’ resources as needed to suit their businesses. In reality, the concept of cloud computing as applied to storage is not new, as the fundamental underpinning of cloud storage is the ability to store, access, share, and protect information over public IP-based networks. What is new is the maturity levels of IP-based storage, combined with service business models that enable support of more applications. The Wikibon community sees three main business drivers for this trend:
- Increasing pressure on CIO’s to get lean and outsource non-core activities;
- A younger buying demographic that is increasingly comfortable with Web 2.0 tools and business models;
- The nature of emerging applications relying on mashups and interactions between multiple discrete components.
Inherent constraints and barriers to cloud approaches exist, including the fact that the cloud doesn’t offer the bandwidth and low latency needed for many applications, especially those with a high number of user interactions (e.g. desktop publishing) and high write-to-read ratios (e.g. transactional systems). Also, many organizations are still uncomfortable bearing the risks associated with consumer-like service models that involve turning over security, privacy, and data protection to an external service provider.
What will the architecture of cloud storage look like? Many examples of cloud computing and cloud storage exist today including user-direct services such as Hotmail, Gmail, search, Google Apps and Picassa; and software-as-a-service (SaaS) applications such as Salesforce.com and 37Signal’s Basecamp, and infrastructure services including Amazon’s Elastic Cloud Computing (EC2), Simple Storage Service (S3), Mozy’s online backup and YouSendIt’s file transfer services, and many others. These first generation cloud services provide a glimpse of what’s to come and what could change significantly in computing architectures. Specifically, today’s systems are typified by:
- A set of standard servers running high function OSes including Windows, Linux, or Unix;
- Databases and Middleware (e.g. SQL, Oracle) underpinning applications;
- Modular or monolithic storage with expensive software content performing necessary data management services including placement, redundancy, and recovery;
- An expensive and hopefully reliable network of IP devices, SANs, and related infrastructure.
What’s different about cloud computing? Google File System and UC Berkley’s Oceanstore Project point the way to next generation application and data architectures. These systems can be described as global and persistent, meaning they scale to support billions of users and are geared for information that must be perpetually protected (e.g. Web data or archives). The attributes of these systems include:
- Storage is placed in close proximity to the access point, storing the data in the best location to optimize network performance;
- Storage is protected in several ways, making it private, indestructible, resistant to denial of service, and able to withstand wide-scale disasters;
- The storage is auto-managed, meaning diagnosis and repair must be completely automated and simultaneous with operations (i.e. self-healing and non-disruptive);
- The system is able to ingest new hardware and software technologies as they become available;
- Coherence of data is compulsory, i.e. copies of data that are spread throughout the network are consistent;
- The underlying storage is dirt cheap to support this type of scale.
What are the benefits of this approach? To understand the benefits of cloud infrastructure it’s useful to look at two constituencies, users and IT/service providers.
- Anywhere/anytime access to data;
- Improved reliability and security of data (relative to laptop storage);
- Wider access to free software supported by advertising;
- Software that’s up-to-date with less malware.
- Lower costs for proof-of-concept and one-offs;
- Shared infrastructure costs (e.g. electricity and real estate);
- Lower CAPEX;
- Improved utilization and peak management;
- Outsourcing of infrastructure maintenance;
- Separation of application code from physical resources.
What are the drawbacks and risks of this approach? Wikibon members see several downsides to cloud computing that users should consider, specifically:
- Security and privacy concerns – most computer crime is internal or from negligent acts;
- Difficulty of separating applications from storage;
- Concerns about loss of control and exit strategies, especially when writing applications to proprietary APIs;
- Subscription OPEX, which over time can be more expensive on a TCO basis;
- Risks of service provider viability;
- Risk that management, performance, and reliability will not be as controllable as today’s computing models.
What does cloud storage mean for IT organizations? Wikibon members discussed the question: “Is there any doubt that cloud storage represents the future model?”
While the consensus was that yes, there are some doubts, the conclusion from Wikibon members is that the groundswell of interest in the consumerization of IT and the strong desire to outsource non-core activities will override security concerns. And a younger decision-making demographic is clearly supportive of new computing models, including the cloud.
Moreover, while cloud storage is still in its infancy, it represents an opportunity for organizations to get rid of stuff that can be placed in the cloud. Candidates include backup of remote desktops, file services, certain user application licenses, email most importantly, and a growing list of social networking and Web 2.0 applications.
While there is a tendency to run from the hype of such a topic, experimentation with this concept will probably yield positive outcomes at least with respect to defining solutions around this type of computing infrastructure. Importantly, organizations must be cognizant of exit strategies (for data and applications) when investing in the cloud and its associated APIs.
Action item: Users should look at the backlog of opportunities and choose an area where cloud computing is potentially appropriate (e.g. archiving, remote backup and certain software development projects). Choose a low risk initiative and experiment with the objective of gaining knowledge, confidence, and an understanding of the critical metrics involved in moving applications into the cloud.
While cloud computing is a hot topic today, how should users approach this nebulous resource? The most important step is to recognize the key dynamic of cloud computing is shifting demographics. A good way to get a sense of what is happening in your organization is to:
- Conduct a meeting about your organization’s posture on cloud computing with your most senior managers – preferably those over 40.
- Conduct a meeting about your organization’s posture on cloud computing with your least senior managers – preferably those under 30.
Make sure lawyers over 40 and under 30 are present for both meetings. Then try to reconcile the two. The older group will show conservatism and a huge concern for security. The younger group will demonstrate increasing comfort with Web 2.0 tools and business models and want to actively embrace them.
While cloud computing holds great promise, for many traditional applications it’s too young. Cloud computing requires new stuff including hardware, file systems, data protection techniques, and security approaches. Note that elastic computing such as Amazon Elastic Compute Cloud (EC2) is actually more interesting but it should only be used for experimental purposes
Nonetheless, by all means, increase awareness and experiment with cloud computing. Explore available services; investigate file systems such as OceanStore. Don’t let information risk management policies cut users off from experimenting with new services. Design processes to allow this to be done safely rather than not done at all.
Ultimately this will support the goal of getting leaner faster. Applications that should be considered early in the phase include remote backup, archiving (notwithstanding the privacy and security concerns), email and social networking activities.
Also take note that it’s hard to remove applications from their storage so they either both have to go into the cloud or they should be avoided until visibility becomes clear.
In his new book "The Big Switch - Rewiring the World from Edison to Google," Nicholas Carr argues that in the early 1900's companies stopped generating their own power with steam engines and connected to the new electric grid. The driver according to Carr was ubiquitous access to cheap power which set off a domino effect of economic and social transformations that ushered in the modern business world. The book's premise is that a similar trend has begun with the Internet's global 'computing grid', where massive information factories are supplying data and software services to consumers and businesses.
Without debating the nuances of Carr's argument, the sentiment in board rooms is clear. The directive to CIO's is outsource processes that are non-core. Everything's on the table from HR and procurement to IT infrastructure such as email, archiving, and remote backup. A natural tension is brewing between the need to get mean and lean and the entrenched IT processes of organizations. The tensions are not just political; they involve real concerns about privacy, security, application architecture, etc. The tipping point may be a younger demographic that is increasingly comfortable with the inherent risks of online services in general and the concept of placing data in the cloud specifically.
Action item: The organizational imperative of the cloud is to recognize the natural tension between leveraging low cost Internet services and IT's firm grip on business processes. Organizations must deliberately facilitate discussions between the various constituents (e.g. risk management, audit, compliance and LOB's) to determine the most logical path forward. Key parameters and metrics should factor degree of risk, the value of speed and flexibility, ability to monetize proprietary IT, and costs.
The current enterprise data-center model for system design has many architectural layers, each with usually more function than is actually required. Taking just the storage layer, we find that the disk drives come with controllers and software that help manage allocation, performance, data placement, manage remote and local copies. There is complex software, procedures and hardware that manages backup and recovery. Other architectural layers include middleware, presentation services, and network layers.
The type of architecture that typifies cloud computing does away with layers of hardware; the only components are 1µ commodity servers, disks and IP switches. The application is also lean and mean, including all the functions required and no more. For example, the storage functions of allocation, data placement for performance and recovery are performed by a file system. Examples of file systems with the necessary functionality are UC Berkley’s OceanStore Project or Google File System. The data is virtualized and multiple coherent copies can be stored across a network of servers and storage. The file system virtualizes blocks of data and keeps track of where they are and how safe the data is. The file system assumes that software and hardware can and will fail, and will recovery seamlessly without user impact. Other functional components of the system are added only as required, and often from open source.
The result is a complete re-architecture and slimming down of the system. The function that was provided by the storage and backup systems are now provided by the file-system with a single architectural layer. What was an application supported by “fat” (over functioned) layers is now reduced to a “slim” functionality in very few layers. The choice of functionality included in each application is appropriate to the business requirements.
EMC’s Hulk is a good example of hardware designed for this new paradigm. It is a bare metal assembly of 1µ Intel servers, IP switches, and 1TB commodity disk drives. It is designed for each component to be swapped out while the system is running by an operator and comes with thin margins, no software and no maintenance costs. EMC’s Maui (based on OceanStore, or just looking like OceanStore?) might provide one version of many alternative software options that can be run on Hulk.
The benefits of such an architectural approach for appropriate applications are very significantly reduced costs of computing and increased potential to outsource the operations.
Action item: IT executives should put their best and brightest architects and encourage experimentation on completely different ways of delivering application value to the enterprise and to enterprise customers, suppliers, partners and other stakeholders. Storage executives and professionals should become conversant with the emerging technologies and become trusted advisers to application designers and development teams on file systems and the management of data.
Storage vendors face increasing pressure from customers to describe their future role with cloud computing. Still in the experimental stage but coming fast, the advent of cloud computing represents an alternative source of delivering computing applications over the public Web and potentially over private intranets as well. The concept of cloud storage (within cloud computing) centers on the ability to store, retrieve, share, manage, and protect information over public, and possibly private, IP-based networks for indefinite time periods. This means, for example, that data protection techniques will need to extend well beyond those used for laptop and mobile appliances today.
Nearly all storage vendors will have some role to play within cloud storage, and customers will increasingly want to know what their roles will be. Cloud systems will scale to support billions of concurrent users and are aiming for the storage and retrieval of information that must be perpetually protected anywhere and anytime.
Action item: Cloud computing will drive existing storage and data management concepts to the next level and on a much greater scale than ever before. With cloud computing generating more visibility and interest every day, storage vendors should be prepared to clearly articulate their role(s) with cloud storage and describe what they can do to enable end-users to effectively manage this new wave of computing applications. The time to start preparing these strategies is now.
As cloud computing matures it will create many opportunities to get rid of stuff from the data center and migrate applications and storage to the cloud. It will be important to pick your spots and be cognizant of the switching costs involved in cloud hopping. In-house development should use the appropriate open source API’s and functionality to minimize the overhead of migration to another platform. For both enterprise IT and enterprise users, it will also be imperative to establish legal ownership to code and data in the event of bankruptcy or sale of part of the cloud.
Action item: Executives responsive for IT, business risk, and compliance should work together (and with managers under 30) to ensure that the business can take advantage of cloud computing while building in sensible exit and cloud hopping strategies.