Storage Peer Incite: Notes from Wikibon’s August 14, 2007 Research Meeting
This week Wikibon presents Archive uses and abuses beyond e-mail. A string of high-profile civil suits resulting in judgements in the $100 millions to billions of dollars against such large financial houses as Citi and Morgan Stanley have hinged on the defendants' inability to produce e-mails subpoenaed by the court. Judges in these and other cases have proven impatient with excuses about lost or destroyed archives of these unstructured documents. This has put e-mail archiving in the cross-hairs of legal firms representing former employees, stockholders and customers -- virtually anyone looking to initiate legal action against any public or private company. Archiving, originally viewed as backup in case of a computer failure, has become a business requirement for defense against such suits. Instead of being put on tape, stored in a closet and forgotten, e-mail archives today need to be cross-indexed, organized, and readily available in case they need to be produced, perhaps several years hence. Digital data is specifically mentioned as being discoverable in the latest revision of the U.S. Rules of Civil Procedure, implying that what is true of e-mail today will become true tomorrow for other forms of unstructured data including written documents, spreadsheets, and potentially instant messages.
As a result, corporate data administrators are caught in a race to solve the problems of archiving vast amounts of e-mail in a way that it can be quickly located and produced, years later if needed, for any potential legal action. They badly need a technical solution automating the application of business rules to these huge, ballooning archives to ensure that e-mails can be found and, equally, to delete them as soon as legally permissible. And this technology needs to be extendable to other forms of unstructured data as well.
Wikibon believes that archiving should go beyond the requirements to produce e-mails in response to legal actions. Archives need to be readily available so corporate officers can proactively head off potential problems. For instance, HR needs to search them to identify sexual or other kinds of abuse before a situation reaches a court. Sales executives need to monitor e-mail for signs of contract abuse and fraud by sales people.
The trend toward including e-mail in the discovery phase of legal actions is having an impact on IT budgets, and in cases where IT fails to meet court deadlines it can have a terminating effect on the careers of CIOs and storage administrators. However, constructing an adequate archive will be a challenge for many companies. This presents a golden opportunity for service providers who wish to specialize in archiving unstructured data. Many organizations will look for such specialists to lift the burden of archiving from them and provide a guarantee that in the event of legal action they can meet the court's discovery requirements for unstructured data without providing material not required that may lead to fishing expeditions by lawyers. Bert Latamore
In the last six months our interactions with Wikibon members have highlighted the increasing role of e-mail archiving in storage planning and budgeting and ultimately the determination of the role storage will play in business. Over that time we have seen a growing emphasis on minimizing business exposure to sometimes frivolous-seeming but potentially very expensive lawsuits focusing on the degree to which the organization successfully secures and manages its email archiving systems. This has focused corporate legal concern on increasing control of how e-mail data is sustained and erased, based on sound business rules.
However we believe this is just a first phase of activity in the area of archiving unstructured data. The concern over liabilities resulting from the inability of corporations to produce relevant e-mail required by the courts is already leading to the formation of better e-mail practices. Over the next few years this concern will drive the creation of active archives that will be able to sustain complex, high-quality applications focused on information asset and liability management. We expect this focus to broaden beyond e-mail to include other forms of unstructured data. We are already seeing the appearance of automated tooling for archiving any unstructured data, supporting the transfer of practices developed for e-mail to other unstructured data domains typically associated with the same quality of metadata.
This concern about liability will dominate many of the discussions about storage administration funding and direction in the next few years. Eventually businesses will expect storage administrators to create a unified unstructured data archive management environment supporting active archives against which high value applications can run to provide critical business capabilities. We expect the emergence of a new class of experts within business focused on aggregate considerations of information asset value and liability negation. This new “records management” function will work closely with storage administration but will probably be independent of it, working more closely with overall business operations.
The implications of the concern about litigation and the technical evolution it drives also has important implications for how storage administrative functions are performed and sourced from third parties. On the one hand, this creates a significant opportunity for outsourcers who make themselves experts in the technical side of creating, documenting and implementing administrative controls on storage that mitigate the risk for customers. Many organizations will have a difficult time keeping up with the technical and architectural developments of active archiving and will be happy to turn this problem over to expert service suppliers. On the other, corporations will be concerned about what the third-party will do in the event that a judge threatens it directly with liability if it does not turn over all requested documents immediately.
Action item: Storage administrators in the midst of efforts to gain control over e-mail archiving applications will find themselves in an arms race with external legal groups who regard these archives as a potential gold mine for claims. The only way to gain control in this race is to work closely with operations and transfer knowledge gained in the e-mail domain to other unstructured data domains in ways that foster the emergence of overall integrated processes, controls and solutions for creating, administrating and managing active archives.
Until recently, the business context for archiving applications has remained remarkably consistent: move old data from costly "online" media (usually disk) to less costly "offline" media (usually tape). Variations on this basic archiving theme have emerged (e.g., "nearline" automated tape libraries and optical jukeboxes), but the basic economics of archiving have been consistent for nearly 40 years, largely because the basic business requirements for archiving haven't changed.
However, the emergence of new regulatory and compliance regimes such as e-discovery are forcing business to revamp requirements for old data, which is fundamentally altering the scope of archiving applications. Archives now must be accessible by applications other than the toolset performing the archive and the initial application that created the data being archived. "Active archives," featuring shared access to archived data from multiple applications, are becoming an integral part of an organization's dynamic data infrastructure. However, archives must retain critical business attributes, such as data fidelity, auditability, and financial flexibility. Consequently, a new class of IT activities is emerging around active information archives. The need to support archive sharing in active archive infrastructures is compelling IT to experiment with a management framework that recalls the three-schema approach to database management: records management to handle "conceptual" access to active archives, archive administration to handle "physical" access to archives, with "logical," tool-orientated access being worked out on a case-by-case basis.
Action Item: Storage administrators must update their perspective on what constitutes archiving competence to accommodate rapidly emerging business requirements for active archives. Increasingly, "records management" activities, which share more with application development than hardware administration, will drive archive management decision making.
E-mail archiving processes are evolving as are the roles and responsibilities of those involved in planning and executing systems to reduce corporate exposure and mine information as an asset. E-mail storage administration has been in react mode in recent months, especially as changes to the Federal Rules of Civil Procedure, adopted in December 2006, now force civil litigants to consider electronic evidence as part of the discovery process.
But as e-mail archives evolve from static (e.g. .pst repositories) to active (e.g. e-mail archives with relationships in tact) to active archives for other unstructured content, so too will storage administration evolve from reactive to proactive management of data. Tensions will undoubtedly escalate between different constituencies in the organization as this occurs. Specifically, while initially there is an alliance between IT and the legal function (serving legal, audit and compliance with sensitivity to business users) there is a clash looming between these constituents, records management and storage administration.
Indeed, as the records management function becomes more prominent, the desire to retain information will be counterbalanced with the need to eliminate information with certainty, reduce complexity and reclaim unused storage space. This re-emergent records management function will likely be responsible for establishing and enforcing policies around creation and management of classification metadata and migration, retention and deletion policies. As such, this new records management function will have a high degree of automomy but must work closely with both IT and business lines.
Action Item: IT in general and storage administration in particular should recognize that they will be required to administer and ultimately integrate unstructured content archives broadly across the organization. Like other cross-functional, high technology-oriented but business-driven initiatives (e.g. disaster recovery), CIOs must ideally have direct reporting authority to or at least strong matrix influence over the policies, procedures and practices associated with e-mail and other unstructured content archives and should propose organizational structures that reflect these responsibilities.
In the arms race of e-mail-based litigation, opposing attorneys have successfully attacked e-mail management rather than the e-mails themselves. An organization must be able to demonstrate that it follows clear, well-defined e-mail management policies and procedures. If not, it is open to accusations of systematic destruction or spoliation of e-mail-based evidence.
If proved, the penalties for these types of accusations are severe, ranging from sanctions to assuming adverse inference. The court can state that the evidence not produced by an organization is assumed adverse to its case. This can shift the burden of proof from plaintiff to defendant.
Action item: All archive policies and procedures for retention, backup, disaster recovery, security and access control (especially for internal and departed staff) may all come under scrutiny and attack. IT organizations must put into place strong procedures for regular internal and external audit of these procedures.
While archiving competencies will likely remain distinct between infrastructure that scales and applications that provide function, there exists an emerging opportunity for vendors to outsource active archiving services for both infrastructure and applications. Firms that provide storage and related hosting services will find significant opportunity in codifying and administering storage capabilities that are easily translated into risk mitigation and liability management services.
However while most hosting companies tend to focus on the function of the hosted application and it's non-disruptive nature to existing infrastructure (e.g. storage and networking), these new sets of services will no longer only be focused on migrating data to cheaper storage but will also add a dimension of recreating and managing business history. Questions remain as to how to best accommodate this need (e.g. using original data or data extracts) and the legal ramifications of implementation approaches, but the emphasis on processes and policies for classification, migration, retention and deletion will escalate.
Action Item: Vendors will see a major opportunity to provide external services for active archive infrastructure and applications. The vendor imperative is not just to demonstrate infrastructure and applications competence but also business process reengineering and integration capabilities.
For many organizations, the best way of reducing the risks of legal exposure from unstructured data is getting rid of it as soon as legally allowed. This is easier said than done. It is necessary to “shred” the data, not just delete it, as deleted files typically remain stored on drives and tapes waiting to be overwritten. It is necessary to find all the copies of, for example, an email, not just the original email.
Action item: Organizations should put in place record retention policies that allow and encourage all copies of data to be shredded. This should be done as part of a systematic effort to reduce legal exposure, and not focus on saving storage costs.