In our research on integration of the storage optimization stack into storage arrays, Wikibon concluded that primary storage optimization technologies such as Permabit's Alberio de-duplication should be integrated into the storage array. However, there are other alternatives places where the same technologies could be integrated into other locations in the infrastructure as well, including:
- Integration into the virtualization layer:
- Products such as IBM’s SVC, HP’s SVSP, Hitachi’s USP and now EMC’s VPLEX have a virtualization layer, and can also virtualize heterogeneous arrays. For all those products except VPLEX, the system also delivers additional storage optimization functionality such as thin provisioning and space-efficient copies. This virtualization layer is a sensible place to integrate storage optimization technologies.
- Virtualization systems like the NetApp WAFL System and systems from 3PAR and Compellent and IBM’s XIV all include a virtualization layer and other space-efficient technologies. NetApp already includes the A-SIS de-duplication feature, for free! Implementation of space-saving features like de-duplication is usually significantly easier in a virtualized environment, as "data holes" can be more easily filled.
- In theory this technology could be implemented at the Hypervisor level. This would add a lot of additional code into what is theoretically meant to be a thin layer. A more promising approach would be to implement thin copy techniques that would allow nearly identical copies of desktop and server operating systems to share a single copy of duplicate data in the first place.
- Integration into the database stack:
- Some storage optimization techniques (such as Oracle’s Columnar Compression in Exadata) are included in the database stack. The advantage of placing storage optimization in the database stack is that data can be compressed at a higher level, and less data has to be moved around. This is particularly useful for distributed databases. The other advantage is that the storage optimization can be implemented before security technologies (storage optimization after encryption is an oxymoron).
- The disadvantage of this approach is that an increasing percentage of data is unstructured and not held in databases. The cost overheads of databases are significantly higher than the cost overheads of array software.
- Integration into the application stack:
- Large ISVs such as SAP and Microsoft are under increasing pressure to reduce the cost of the computing infrastructure to run their applications. With SAP’s purchase of Sybase and Microsoft’s ownership of SQL, it is likely that they will offer a software alternative. This could be particularly appealing for small and very small systems.
The most obvious place to start with integrating storage optimization is in the storage array, as the array is almost always shared among many servers. With the advent of clustered storage controllers, this approach is likely to be the simplest to manage and the most cost effective. Other approaches will also have niches but are unlikely to gain major traction.
Action Item: There will be a rich set of alternatives for implementing storage optimization, but the initial focus should be the storage arrays. Look for storage vendors of high integrity and reputation that can perform the technology integration and testing for you. Ensure these vendors have in place the vision, infrastructure and roadmap to get the job done.
The only alternative not to take is do-it-yourself integration, unless there is a compelling short-term business case
Footnotes: