With contributions from Michael Versace
By 2012, 75% of enterprise IT organizations will have implemented strategic data reduction initiatives designed to reclaim wasted storage space, reduce risk and eliminate unwanted data.
Suffice to say, the ever increasing need to create, ingest, manage, and store various content types including documents, emails and other forms of messaging, images, video, voice, and Web content, as well as structured database content, is pushing the limits of traditional storage capacity, planning methodologies, and data center budgets.
In response to compliance and regulatory requirements enacted over the last decade along with changes to the Federal Rules for Civil Procedure (FRCP) that have greatly impacted the way organizations store and “discover” their data, the past several years have been punctuated by a steady stream of mostly point solutions, delivered by the vendor community, that address specific information management pain points, such as the need for data authentication, classification, deduplication, encryption, indexing, migration, and search.
Too Many Point Solutions
For the most part, the implementation of these point solutions has provided much needed relief for organizations with large and growing volumes of data particularly in the areas of compliance, data reduction and litigation support while helping to improve storage utilization and backup windows. However, with data growth in the average enterprise expected to double every 18 months or so, these mostly reactive or finger-in-the-dike approaches will at some point run their course.
Meanwhile, integration costs can be very expensive, often 5-to-10 times the initial software license fees when connectors to key data repositories are required. Many archiving, ediscovery, and search point solutions - some of which are delivered in the form of appliances such as Barracuda, Clearwell, Google and Kazeon - require and maintain their own additional siloed repositories, and some don’t manage more than one or two content types. Deduplication solutions such as Avamar, Data Domain, both now owned by EMC, and ProtecTier, now part of IBM, began as standalone products and are being integrated into their respective product portfolios.
Scalability is also a concern for users and data center managers as data volumes in large organizations have already reached the multiple Petabyte range and most all of the solutions in the space were architected and developed when data volumes were not nearly as large as they are today. In addition, the need for speed of information retrieval, discovery, and enterprise search is pushing the limits of existing solutions to meet evolving business requirements.
Data Cleansing is Part of Data Reduction
As the requirements for data reduction continue to be redefined with the advent of virtual computing and storage and growing storage management costs, one of the next big controversies will be over how to achieve proper data cleansing as part of data reduction efforts. The challenge for companies is not only for data to be confidential, secure, and backed-up, but also to be up-to-date, accurate, and relevant. As massive amounts of data accumulate across the enterprise and data reduction plans are operationalized, the risk of data deletion errors, data being out-of-date or out-of-sync, inaccurate, or no longer relevant to the business grows.
Need for Improved Archival Storage Functionality
Going forward, evolving user requirements and improved information governance practices will demand that vendors add functionality and improve interoperability and scalability and meet user expectations of lower cost, ease of implementation, and higher availability. Storage that specifically addresses active, passive, and deep archival requirements should include:
- Authentication, encryption, or WORM for security and compliance (digital signature, hash algorithms);
- Data compression, deduplication, and single instancing for storage optimization (block and/or file level);
- Data protection, remote replication for back up and recovery (snapshots);
- Public and private cloud offerings to meet service provider and user capacity planning requirements (hosted or hybrid, multi-tenancy);
- Embedded full text index, search and retrieval of archived content for easy discovery (federated or unified search and index);
- Metadata support for unstructured or structured data storage (blobs for content types or relational DB data);
- Standards-based interfaces, support, and open APIs for improved interoperability (CMIS, UDBC, XAM);
- High availability features to protect against single node failure (Grid, RAIN);
- Scalability and affordability for active archival workloads into the multiple Petabyte range (2TB to 10 plus PB, $1 per GB range);
- Lower power consumption and reduced footprint for lower cost (virtualization, power management); and,
- Tiered storage with reporting capabilities and advanced features (self healing, data movement between tiers).
Several established storage vendors in the enterprise space including EMC, HDS, HP and IBM as well as newer players such as NetApp, Nexsan and Permabit are addressing these needs by building “suites” of archival functionality into their offerings. In the case of HDS, they offer “An Active Archive platform for the long-term preservation, optimization, and discovery of business-critical digital unstructured content” with their Hitachi Content Platform (HCP). HCP has a bottoms-up archival storage approach, which obviates the need for a dedicated compliance or archive-specific storage repository, and it offers a federated data discovery suite that allows for indexing and searching. It:
- Maintains a large distributed object-store for unstructured content,
- Supports multi-tenancy and namespaces to securely segregate content,
- Enables each tenant and namespace to have different attributes
- Abstracts details of back-end storage subsystems,
- Homogenizes storage for users and applications,
- Creates a single cluster for archive, cloud and data protection targets simultaneously,
- Partners with archival vendors (Symantec, Commvault, Mimosa, Optim).
Users are expecting more bundled features and functionality in their archival software and information management infrastructure components, especially with storage that is an integral piece of an archival solution. Forward-thinking storage vendors are moving towards the delivery of functionality suites that bring together critical capabilities that in the past were generally available only through the integration of point solutions that added to the overall cost and complexity of an archiving implementation.
Action Item: In order to mitigate costs and increase the efficiency of handling workloads that are directly connected to the management of unstructured content, such as ediscovery, customer service, document, and message management applications, users and buyers need to strongly consider only those vendors that are demonstrating the ability to design, build, and deliver the next generation of archival storage suites. These should contain the features and functions needed to meet users' information availability, compliance, scalability, security, time-to-market, and other business critical requirements, while avoiding point solutions whenever possible.