Originating Author: Bill Morttram
Percolating through the many presentations at EMC World was the notion that data reduction is a key strategy for addressing the ominous data explosion, which, according to Tucci, is expected to exceed 40% CAGR, even in today’s economic turmoil. EMC sees data reduction technology not simply as a part of storage efficiency but also as a significant component in its data protection strategy. This was nicely encapsulated in Tucci’s introductory comments that positioned data de-duplication as the technology that makes D2D back-up affordable.
The interesting twist in EMC’s perspective is the blending of its tiering strategy known as FAST (fully Automated storage Tiering) and data reduction technologies including compression, file level (single instance) and sub-file level de-duplication, both at the source and target side. The EMC vision is a move away from point products, such as Data Domain, and calls for the re-architecting of traditional back-up strategies and methodologies to enable the integration of data reduction technologies across the storage infrastructure. It is ambitious but if EMC can pull it off, will be impressive.
Avamar is EMC's lead product, and last year it had the distinction of being the fastest growing EMC product. Its IP has appearing in a number of products, with Networker announcing de-duplication on 5/19 and Celerra supporting a no cost, file-based de-duplication, available since February; both based on Avamar IP. My question is whether de-duplication is becoming commoditized. This would mean that its future as a point product is limited as it morphs into a standard array feature such as snapshot. This happens to be my perspective, and it explains NetApp's recent acquisition of Data Domain. Another possibility, perhaps supported by the recent EMC announcement of its data de-duplication assessment service, is that de-dupe will become a service.
During one of the breakout sessions the following questions were asked:
- Who is having bandwidth issues? No one responded.
- Who was meeting their BU windows? Mot one positive response was received.
When challenged, about the incongruity of their responses the audience had an “ah ha” moment. The bottom line is that a gap exists in the understanding of the full value of data reduction and particularly data de-duplication. The notion that source-based data reduction reduces the volume of data that has to move over the wire, and hence significantly reduce back-ups windows, is apparently not as obvious as often assumed. This attribute of source-based data de-duplication is one of its key advantages.
EMC has a strong vision, and despite proof-point successes such as Nationwide (which reduced its back-up window from 48 to 8 hours) and Celerra's file-based implementation, it will take some time before the results of this integrated approach will be apparent.
Action Item:
- When exploiting data de-duplication as a standard BU solution, probably the solution of choice should be a point solution. Several options offer very competitive performance characteristics.
- If doing incremental de-duplication, source-based solutions would be the preference. This is where the EMC approach is probably strongest.
Footnotes: