Data deduplication and compression technologies are evolving to become a more embedded component of storage infrastructure. By providing an end-to-end technology stack approach and directly incorporating storage optimization intellectual property into array functionality, optimization technologies will become a basic feature of primary storage infrastructure as well as backup and archiving use cases following the same path as data compression and encryption.
Storage Optimization Techniques
Storage optimization falls into two main categories:-
- Data de-duplication – the finding of identical strings of data and replacing them with a pointer
- Compressing data using either a single compression technique or finding the optimal technique from many. Compression can be lossy (data is not restored exactly (e.g., a JPEG file) or lossless (no loss of data). Lossless is essential for most data processing.
The storage optimization techniques are well established. The benefits are:-
- Lower costs of storage
- Faster and lower cost of transporting data within the storage system or over the network
Nothing comes without costs, which are:
- Processor power to dehydrate and re-hydrate data
- Management of security of storage optimization metadata
- Elapsed time overhead on data dehydration/re-hydration
The highest levels of compression occur in de-duplication of backup streams, where de-duplication ratios of 20:1 are possible. With good backup software (e.g., using incrementals forever) then the ratio drops to between 5-10, but still gives a very good savings, which Data Domain (now owned by EMC) has exploited well.
De-duplication of on-line data cannot use the same techniques, as the elapsed time overhead is far too high. However, NetApp with A-SIS has provided a simple way of identifying 4k blocks at the time they are written, and the performing a post-process de-duplication to remove the common blocks. They can achieve about 30-50% savings in storage. More efficient techniques for achieving the hashing and look up in under 20 microseconds have been announced by Permabit, which will allow "pure" inline de-duplication to be done before the data is actually written to disk.
Companies such a Storwize, Permabit, and Ocarina all have historically used sophisticated compression techniques. Ocarina provides the highest levels of compression by using the optimum compression techniques and even re-hydrating media files and using better algorithms to recompress. This requires large amounts of processing power and elapsed time, and is not suitable for on-line data. Storewize and Permabit have focused on providing very fast techniques for compression and de-compression (sub-millesecond) that allow this to be done in-line, with very little impact of storage performance but with less aggressive compression ratios.
The Issues for End-users
End-users currently have to choose from a series of point solutions. Each works well, but the work of integrating these solutions, managing risk and recovery and scaling falls on the end-user. An integrated stack is necessary for wide-scale adoption of storage optimization.
Storage Optimization Integration Requirements
The key requirements for an integrated storage optimization stack are:-
- Both compression and de-duplication technologies integrated into the storage array infrastructure
- The technologies are additive, and are not in competition
- The storage array (or software equivalent) is a central point for management of storage metadata and recovery
- There is a fast optimization option that impacts the storage access time by less that 1 millisecond for traditional disk storage, and less than 25 microseconds for flash storage
- The storage optimization capabilities apply to both file and block-base storage
- The choice of storage optimization techniques is driven automatically by observed usage patterns, or by the volume classification (end-user override)
- The optimized data can be moved to other parts of the storage network without being re-hydrated (saving storage, re-hydration time and effort, and bandwidth)
- Storage backup, archiving and tiering software are able to optimize by selecting the most appropriate optimization technology for their environment
- Re-hydration can be accomplished by free software, to allow recovery of data without any proprietary hardware if necessary
Action Item: Storage optimization will become a basic capability by 2011. Companies like Asigra are already providing integrated storage optimization solutions in specific areas such as backup. The recent announcement of Alberio from Permabit illustrates the trend that storage optimization technology is being made integration friendly for storage array OEMs to provide under the covers. CIOs and storage executives should expect to see significant announcements in 2010 from all the major storage players and they integrate technologies from multiple sources. CIOs should ensure that their storage vendors are working aggressively to meet the requirements defined above.
Footnotes: Peer Incite on Storage Optimization - June 8, 2010 at Noon ET.