Tiered storage research meeting
From Wikibon
Storage Peer Incite: Notes from Wikibon’s June 26, 2007 Research Meeting
Moderator: Peter Burris & Analyst: David Floyer
This week Wikibon presents Tiered storage: Islands or bridges. With storage demand growing at an incredible rate and no end in sight, the promise of tiered storage of up to 50% savings on lifetime storage costs makes this a very attractive alternative to continuing to throw disk at everything indiscriminantly. In the mainframe world, tiered storage has been reality for years, so why has it proven so elusive a target for the rest of the IT shop? The answer, of course, is in the heterogeneous nature of the storage architecture outside the mainframe world. The diversity of hardware, applications, technologies and storage architectures has created a barrier that so far has remained inpenetrable despite the concentrated efforts of several vendors. The best solution has been to reformat all data into a single storage architecture, but this is an expensive process that has proven impractical for the vast majority of storage. And the file format is only one, and not the toughest problem. The real challenge blocking tiered storage is creating an effective categorization system across the IT environment based on the data access needs of each application and group of users. The conclusion of Wikibon's Peer Incite meeting this week is that instead of trying to build a single tiered storage system to cover everything, users should focus of large pools of homogeneous data such as email systems and software development and test data and build islands of tiered storage around those pools that are growing rapidly. This can result in significant savings and is much easier to implement. The trade off is that those pools will need to be bridged manually, with human effort. Dave Vellante
Contents |
Tiered storage: Islands or bridges
Over the last 20 years, the huge growth in demand for storage at all levels has driven overall storage budgets skyward despite the dramatic, 30% per year, drop in the purchase price of disk storage. This has fueled strong interest from both suppliers and users in tiered storage solutions as a key to overall cost control.
Tiered storage based on IBM's Systems Managed Storage has long been reality in the mainframe world. However, the heterogeneous nature of hardware, software, and file architectures outside the mainframe arena created by the common practice of purchasing a different storage solution for each application has so far defeated attempts to impose a universal tiered solution on a diverse environment. As a consequence, tiered storage still has effectively only 10%-15% penetration in this marketplace.
Nonetheless, the promise of 50% cost savings makes tiered storage an attractive goal. We believe that users need to acknowledge that a single universal solution is impractical and instead focus on homogeneous areas based on technology (Linux, Unix, Microsoft) and application types (software development, data warehousing, email) with large data pools growing at 40% to 50%.
This first step simplifies the resolution of the three main challenges of tiered storage:
- Defining storage service levels based on user and application needs;
- Creating a meaningful data classification structure based on those service levels that best meets the needs of users and applications within a tiered storage structure;
- Establishing a single point of control across each pool of data that controls the movement of data from tier-to-tier and maintains the metadata identifying the location of each data set for retrieval when and if that becomes necessary.
We do not expect to see a general purpose tiered solution in the marketplace in the foreseeable future, therefore. Rather we expect in see islands of tiered storage bridged by human-imposed management and labor.
Action item: Users should cease pursuing any dream of a general-purpose heterogeneous tiered storage solution and instead focus on sets of applications that can be made to look homogeneous at the storage level. Typically these applications will feature very fast growth rates (in excess of 40%-50%) and common techniques for classification, and can be classified under a single point-of-control.
Tiered storage three step
Tiered storage will not be function that is purchased, but rather capability that is built. Moreover, despite most supplier claims to the contrary, tiered storage capabilities are unlikely to be generalized across highly divergent product groups; islands of tiered storage capability is the more likely scenario. Consequently, businesses and IT organizations seeking the benefits of tiered storage are likely to follow a three-step adoption process, heavily dependent on a firm’s willingness to classify assets and activities in storage terms. The first step is process and application classification, in which the data-access and sharing needs of application groups are codified in storage terms in an effort to identify requirements for "virtual pools" of storage. The effort here is focused on ensuring that storage for cross-application information is commonly managed, and not on trying to use I/O as a general-purpose method for application integration (which remains a middleware/database issue). The second step is to apply meaningful data classification semantics within each application group, so that a common set of tool and storage administration practices can be applied within each tiered storage "island." A three-tiered data schema, similar to that which has been successfully applied to the database world, comprised of physical (device access), operational (system access), and logical (administrative tool access) levels, is an appropriate starting point. The third step is to implement the tiered storage island, with an eye to establishing appropriate administrative bridges among islands.
Action Item: The role and responsibilities for the storage administration function must evolve to include important storage-level data administration activities before organizations can consider implementing tiered storage solutions. The evolved role will not own data administration tasks per se, but rather be important participants in overall efforts to craft an information administration capability within business.
Tiered storage: Let vendors integrate the technology
Several key technologies make up an effective tiered storage implementation: software and processes to ensure effective classification, software to non-disruptively transfer data-sets, and hardware to ensure integrity.
The most difficult technical issue to ensure is integrity of data, especially when something goes wrong. For small networks, network-based appliances will work fine, but you have to do the integration. On large networks, strong architectural reasons mandate having the storage controller own all the data integrity and data movement issues.
A growing number of vendors offer in-the-box tiered storage solutions. The natural development of these is towards clustered controllers with significantly enhanced scope of control. Several of these are hitting the market now.
Action item: Focus on data-classification, not technology integration. For each of your key storage pools, pick a vendor that offers the best integrated homogeneous solution and go with it. Don’t try to find nirvana, a single heterogeneous solution across your data centers; if it ever happens, it will find you.
Tiered storage: Who owns classification?
The advice in the tiered storage technology integration alert was to focus on data-classification. There is a large amount of tiered storage management software that seemingly allows a lot of flexibility in developing data classification standards but few how-to guides on how to practically get it done.
The key to success seems to be to take an application view and ensure that there is agreement between the business and IT on the best storage technology fit. This requires an effective way to look at all the storage resources used by an application and measure the effect of moving the application through the tiers. This data, together with the relatively simple data of the number of users, amount of time used, and relative contribution to creating business value, should ensure that effective classification decisions can be made and maintained automatically.
Action item: Take a simple application-led approach and avoid the temptation to take on too much (e.g., an ILM classification strategy). Collecting any metadata is good and can be enhanced in the future. The job of classification belongs to the users. The job of providing the infrastructure of data services that can create the data for users to make the decision belongs to IT.
Show me the money for tiered storage buy in
The confluence of low-cost SATA devices, a large installed base of storage networks and very rapid storage growth has led many large and smaller vendors to embark on tiered storage strategies. Indeed tiered storage is becoming the consolidation buzz of the 2000’s. But the uptake is not nearly as dramatic. Tiered storage adoption has been hampered by justification concerns, complexity and politics.
Successful vendors in the tiered storage space will offer sets of services to assist in accelerating the acceptance of the approach. These include:
- Classification
- Migration
- Integration of tiered storage management software
- Monitoring/reporting/performance analysis
- Metering and chargeback services
Of these, classifying data is perhaps the most critical and often cited as a logical starting point. However, as savvy business people observe, the natural reaction to teams of consultants setting up shop in conference rooms, armed with probing surveys, will be 'don't take my storage off tier 1.' This presents an obvious problem for IT, but the lines of business don't always share the problem.
Lack of transparency in chargeback models often leads business lines to dismiss storage savings as a ruse by IT to get more budget. Suppliers must recognize this dynamic early on in the process to avoid scope creep, examine possible changes to business processes and increase chances of success.
Action Item: Vendors of tiered storage solutions must recognize that tiered storage implementation is a labor-intensive process and one that requires relevant services to succeed. These services often need to address business process deficiencies first, before tackling the implementation issues.
Tier 3 processes are diamonds in the rough
Today’s tiered storage differs dramatically from past HSM (hierarchical storage implementations) in that the emphasis now is on non-disruptive disk storage whereas in the past, tape-based strategies were predominant and highly disruptive. In addition, high-performance data movement (aka asynchronous, disruptive data movement), policy software and data classification approaches have evolved substantially to accommodate a world increasingly dominated by unstructured data growth.
Interestingly, a decade ago, it was common to make jokes about mainframe practices that apparently had little value left. RACF hairballs and VTOC slurs were good humor back then. It's ironic to observe the development of open systems and the problems that the lack of critical information and associated processes is causing. The convenience to storage administrators to be able to easily access metadata that lists the contents of data sets, their location, size, etc. would be enormous, but alas it doesn't come that easily in open systems.
Action Item: Storage administrators should look to mainframe best practices and tap the resident process expertise where possible as related to tiered storage, and especially tier 3. Take good storage administration practices surrounding HSM, SRM, archiving, backup, data movement, space reclamation, etc., and apply modernized tools to address today's open storage needs.
