Prior to the updating of Federal Rules of Civil Procedure (FRCP) in December 2006, the landmark case Zubulake vs. UBS Warburg provided somewhat of a safe harbor for companies unable to retrieve Electronically Stored Information (ESI) due to inaccessibility “within the normal course of business” and supported the ability for the disclosing party to shift the costs of restoring “inaccessible” back up tapes to the requesting party. The ruling also addressed important eDiscovery issues for the preservation of email and provided lawyers and courts with new best practices relating to both the legal and technical aspects of electronic discovery. See EDiscovery Woes: Not all the Vendors Fault
As is the case with many large enterprises, using tapes for daily, weekly and monthly back ups has been an integral part of the back up process for decades and most likely will continue to be around for the foreseeable future due to tape’s low cost per gigabyte, it’s “greener” carbon footprint characteristics vs. spinning disk, and its continued popularity for disaster recovery (DR) applications. Therefore, it is not unusual to find large companies with massive tape libraries measuring into the thousands of tape drives.
However, while tape is not likely to disappear for deep archiving, back up and DR purposes, tape is sub-optimized for the increased eDiscovery and data culling activities that enterprises are experiencing. Courts and savvy litigators have become more aware of technologies that make ESI on tape more reasonably discoverable. Most tape libraries are poorly indexed in any case, and searching for stored deep archive data regardless of the reason can be expensive and time consuming. Corporate legal departments generally exacerbate the problem by waiting for litigation that includes tape discovery before they seek a solution.
Today, what was once deemed inaccessible is now much more easily retrieved from tape back-up or so called deep ESI archives thanks to a number of solutions and services available in the marketplace. There are several forensic, tape restoration services such as RenewData and eMag that charge as much as $900 or more per gig. This is a reasonably cost effective approach when just a few tapes or perhaps even up to a few hundred are involved. Meanwhile in-house solutions providers such as Stored IQ and Kazeon extract, index, classify, and deduplicate ESI from deep archives, but they must restore to disk first in order to do so. Index Engines is the only in-house solution that can build an index directly from both offline (tape) and online (network) data.
Approach
Index Engines has a patent pending, sequential batch oriented index building technology delivered in an appliance that understands multiple complex formats (TSM, Net backup, Net Worker, Backup Exec, Commvault, etc.) and scans data at the “speed of tape” but can also index data stored on disks and file shares at speeds up to 1 Tb per hour using a single node. This full metadata and content indexing capability, which uses the hash values of objects, provides deduplication and intelligent culling as well as extraction of relevant data to a records management solution or archive which can prove very useful for early case assessment.
The solution allows for proactive tape remediation by eliminating unnecessary tapes and has a relatively small index footprint which averages 5% or so of the entire data indexed. The solution takes “two passes” at the data, the first to quickly catalog the data, the second for full text indexing. Searches and queries are performed very quickly by avoiding the need to search the primary archived data. Index Engines also supports index and search capability up to one billion data objects in a single engine and full text and metadata indexed in "250 million object segments and, through a centralized index, supports 1 billion objects to be fully queried and extracted within one appliance." See the related press release for full details.
Index Engines refers to its methodology as “Retroactive Records Management”. That is, cull early by eliminating system file tapes, incremental back ups and tapes based on timeframes. They accurately point out that much of the data on tape is duplicated. Given that restoring data is the largest cost associated with pre-eDiscovery activities, more firms are considering this approach. Index Engines claims it can install its tape appliance starting at $150K, the online solution starts at $75K.
Futures and Concerns
Index Engines is a small firm with fewer than 50 people, which addresses the needs of very large enterprises with massive volumes of tapes some numbering 20,000 or more. This is their key differentiator. It remains to be seen if they can build out additional functionality to be more of a mainstream player in the eDiscovery, early assessment space. Suggested areas could be improved analytics and pre-review capabilities if not closer partnerships with the players in that space already such as Clearwell, CommVault or Mimosa.
Bottom Line
No doubt a proactive approach to restoring or staging ESI for anticipated eDiscovery activities is a much more cost effective approach than reacting to individual requests for ESI stored in deep archives on tape. Index Engines appears to have a unique approach that will benefit those with very large numbers of tapes.
Action Item: For enterprises with large volumes of tapes that include deep archival information that could be used for eDiscovery or knowledge management purposes, these enterprise users should review the Index Engine solution to see if it fits in their portfolio of products to enable proactive eDiscovery activities.
Footnotes: Index Engines Case Study