Storage Peer Incite: Notes from Wikibon’s February 3, 2009 Research Meeting
Pity poor enterprise IT managers. All they want to do is manage the infrastructure and make customers happy. But they've been thrown dead center into one of the hottest legal controversies to face business in decades, with no obvious good choice.
The new legal discovery demands for access to virtual reams of data in liability cases is a lose-lose proposition. One cannot predict what the courts may demand in some future case years hence, and if the organization doesn't archive and preserve whatever that may be, it may cost a job and a company millions in legal penalties. But if organizations archive everything, including every email, every change in every document, not to mention digital telephone recordings and chat sessions, the already rapid increases in storage costs will turn straight up and head for Jupiter and beyond. Companies cannot afford it.
And legal advisors cannot give clear guidance, mainly because this is still untrod ground in the courts, and many of the issues have not been settled. What is reasonable legal discovery for long-forgotten email exchanges? No one really knows yet, and the answer is likely to vary from jurisdiction to jurisdiction. And then there is the problem that if too much is saved, things that should and could have been long destroyed may come back to haunt the enterprise if a clever opposing counsel gets ahold of it. And finally there's the issue of managing the large amounts of corporate material that exists or is changed on user laptops, which often live beyond the electronic boundaries of the enterprise. And then who archives documents that are created by one organization and then changed by others involved in a virtual enterprise?
On Tuesday Mike McCreary, VP of startup Rational Retention, a boutique advisory service focused on this specific area, and former head of Pfizer's Legal IT Department, gave the Wikibon community a presentation on the latest developments in this area and architectural approaches to dealing with the issues titled "Managing Archiving and Retention Risk". That presentation inspired the articles below. G. Berton Latamore
Originating Author: David Vellante
Guest Analyst: Michael McCreary
In the past five years, corporations have come to realize that the exposure posed by unstructured data generally and email specifically required fast action. Organizations responded, as did technology vendors, which seized the opportunity to aggressively market solutions designed to plug gaping holes and forestall immediate threats. This kneejerk reaction, while necessary, precipitated a mentality in many organizations to “archive everything.” The approach has brought with it three pressing challenges for users:
- Storage costs have gone through the roof, with some organizations reporting that email archiving and legal discovery are the fastest growing consumers of capacity in their organizations.
- e-Discovery costs are exploding because discovery is a volume-driven activity and more information means more lawyer time (and bills).
- Hidden risks are mounting. Specifically, by keeping documents that are not required for legal or compliance reasons to be preserved, organizations are exposing themselves to opposing counsels using this retained information to re-write history in a legal battle.
Further complicating risk factors is the fact that today's employee is highly mobile and works daily with hundreds of messages, files, database records, and websites, all accessed through various computers and mobile devices. In order to manage and secure these decentralized volumes of data, new tools must do a defensible job of automatically understanding document content and context in order to make lifecycle decisions. Moreover, managing documents at the point of creation, where the content lives, with minimal disruption to users, is critical.
Next Generation Architectures
Recent advances in search, indexing, categorization, and document management technology promise to make it possible for companies to move beyond the practice of shoving everything into a centralized archive, to a more targeted business capability that, for example, ensures documents that need to be retained during a legal hold are secured and those that have reached their end-of-life are fully and defensibly deleted.
A key enabler of this approach is auto-classification. While this is by no means a solved problem, algorithms such as Probabilistic Latent Semantic Indexing (PLSI) and Support Vector Machines (SVM) produce accurate results comparable to those generated by humans. The goal is legal defensibility, not perfection, and this is the standard legal groups need IT to deliver.
At its February 3rd Peer Incite, the Wikibon community heard from Michael McCreary of Rational Retention who laid out the major components of next generation information risk management solutions. The key aspect of the approach articulated is the recognition that risk is decentralized and managing distributed assets is a fundamental requirement. To do this, nextgen architectures will continue to rely on centralized metadata repositories while deploying tools to identify, auto-classify, move, and prevent or enforce deletion of data. As well, a policy engine is required to enforce document life-cycle management at the distributed source where content resides.
Importantly, this architecture requires an agent-based system with endpoint discovery tools. Users need to ensure these tools are mature, reliable, and do not disrupt existing operations.
Systems installed in the early part of this decade are largely turning out to be stopgap measures, and recent investments will show minimal obvious business payback. The ROI has been to reduce the risk of getting sued. While many IT managers communicated this fact early on, it’s likely that some in the corner office are unaware that their multimillion dollar insurance policies are aging rapidly and need to be overhauled.
Next generation architectures will require investments in software, services to assess requirements, establish policies, etc., and possibly network upgrades to effectively deal with remote offices and distributed locations.
The payback will be seen in two main areas:
- More efficient use of storage;
- More efficient discovery resources.
While Wikibon has not completed an extensive ROI analysis, it’s hard to believe that storage cost reductions will offset investments in new systems. As well, even with a decentralized management approach, discovery costs are likely to continue to escalate as data volumes rise. The ROI here will be a story of “what would the costs have been if we didn’t act.”
Action Item: Current email archiving and e-Discovery systems are proving to be too expensive, hard to scale, and overly complicated. Organizations must draw on recent experiences to re-assess organizational risk holistically, identify the costs across legal, IT, lines-of-business, audit, etc., and begin the process of defining next-generation architectures. The key will be to leverage current centralized processes and best practices where appropriate while adapting to the new reality of distributed risk management.
Footnotes: Archiving & Retention Risk Management - a White Paper by Michael McCreary
The current path of email archiving and retention systems is not sustainable. Forced to plug 'smoking gun' email holes, organizations are realizing that storage and discovery costs are escalating beyond control, systems are hard to scale, performance is a constant challenge, and "archive everything" approaches bring unnecessary risks.
For example, studies indicate that only 5-10% of an organization's stored content must be retained for legal and compliance reasons, leaving a massive amount of corporate data that is either unnecessary or can be potentially twisted to re-write history and attack a company in a legal sense.
Based on feedback from the LegalTech event, the vendor community is keenly aware that customers are frustrated. Comparisons with ERP and CRM were heard from clients, and vendors are beginning to craft responses. Users however must maintain a healthy skepticism when evaluating roadmaps and be open to new approaches and emerging technologies (e.g. see Digital Reef and Rational Retention). While these and other startups are immature, they appear to be directly attacking tough problems such as auto-classification and distributed risk management. As well, they are not hampered by legacy baggage.
Action Item: Organizations must begin architecting next-generation archiving and retention solutions focused on managing risk at the point where content resides (distributed risk management). Set management expectations that existing highly centralized systems and processes are inadequate and need to accommodate the new mobile workforce reality.
While legal departments are driving the strategy for the acquisition of tools and services to support compliance conformity and litigation preparedness in most enterprises, the process is too often ad-hoc, reactive, and very costly. Many general counsels find that their e-discovery and retention costs are continuing to escalate beyond what they had anticipated even after implementing solutions intended to reduce risk as well as expenses.
A large part of the problem is the lack of available, proven NextGen solutions to properly address the shortcomings of existing FirstGen systems including issues with scalability and performance, integration of modules, poorly conceived architectures and less-than-stellar search and auto classification capabilities. Legal departments, along with representation from IT, Records Management, Compliance, Risk Management and HR departments need to create a business case for proactive investment rather than being stuck in a reactive expense mode.
Since legal owns the budget in most cases, it should also own the process. Managing policies and litigation activities is essential, but more could be done to further mitigate risk and expenses. A first step that doesn’t get enough attention is reducing the sheer number of emails created by encouraging better electronic communication skills and behavior. Imagine reducing unstructured ESI output by 20% without compromising productivity or collaboration, as some firms claim they have done.
Action Item: After taking stock of your existing archiving and retention environment and setting some reasonably attainable objectives, start building the business case around investment in NextGen solutions and architectures. Assume most of the technology you have in place is tactical rather than strategic and that the major vendors in the space know it and are working on replacing their existing offerings. Review solutions from niche and start-up vendors who have been working on solving many of the architectural and technical shortcomings of FirstGen offerings, such as lack of distributed control and improved categorization, as they may be able to bridge a gap to the next generation.
Data preservation obligations associated with litigation and regulations such as Rule 17a have fueled the creation of compliance archives in many organizations. However, without the ability to categorize data by content firms lack the means to defensibly delete documents from the archive. The resulting escalating storage and e-discovery costs coupled with the risk of keeping unneeded data will drive a fundamental shift away from centralized compliance archives towards a distributed “Preserve in Place” model. The key technology enablers are content based auto classification, coupled with a “Command & Control” architecture.
Command will be in the form of centralized document information and meta-data used to automatically classify documents against consolidated policies for records, privacy, and data protection. Control will be effectuated by smart endpoint devices and repositories communicating with the central policy engine to enforce centralized decisions at the O/S and application level. For endpoints, most policy compliance can be ensured through controlling a handful of actions – specifically: 1) Prevention of deletes - preserve in place 2) Copy or move as directed - collect and selectively centralize 3) Delete as directed - enforce retention. The net will be centralized, automated decision making, enforced across a distributed and heterogonous environment. Tools such as Rational Retention are delivering these capabilities today.
Action Item: Avoid the quick fix of compliance archiving without ensuring some method for classification and destruction and explore next generation tools to centralize policy administration with open API’s to hook into smarter endpoints.
A review of the major e-discovery vendors this week at Legal Tech in New York City has led me to believe that those of us hoping to get a peek at the next generation of solutions for the space will have to wait a little longer. While there are some well regarded solutions and services that address the requirements within a legal department, a smooth e-discovery process as defined by the EDRM framework needs to rely on many other architectural and technology or services components to function adequately.
This is not an easy task. Lots of smart people have been working hard to solve the inherent end-to-end problems that range from lack of scalability through integration woes, unbridled storage growth, inflexible policy management and auto-classification engines, and inadequate search capabilities, each seemingly coming from a different angle ever since the first group of users was searching for help to solve their compliance and risk mitigation requirements to keep the regulators and opposing litigants at bay.
Vendors and service providers understandably responded with solutions that were created for other purposes but, as we soon discovered, were not well suited to sift through millions of email records located on multiple types of storage media accessed over constrained networks and perhaps distributed throughout a corporation’s IT assets - or even beyond to employee’s personal computers and storage devices. These systems were the first generation offerings. Unfortunately, many of these FirstGen system components are still an integral part of today’s offerings.
Most legal and IT teams are in a reactive mode when it comes to acquiring e-discovery, archiving, and retention solutions. This is understandable given today’s regulatory and litigation environment. However, buyers are becoming more sophisticated in the space and are beginning to expect more from their vendors than integrated solutions that one could argue are made up of outdated FirstGen components rather than a unified solution architected and built from the ground up.
While integrating e-discovery, message archiving and enterprise search with content management and collaboration tools would seem like a natural fit to fulfill many a business productivity dream, I fear this is too much of the cart before the horse. The only logical rationale for acquiring this set of solutions is to acquire customers and some smart applications design people and to knock out potential competitors. I don’t see how it serves the customer’s best interests.
Action Item: E-discovery and archiving vendors need to tackle the real problems users face with their existing solutions and offer major enhancements or, better yet, NextGen replacements for outmoded FirstGen solutions. The market needs truly unified, scalable, enhanced solutions, not more integration of older systems.
Many litigation-prone organizations have installed archive and e-Discovery point solutions with the purposes ensuring record retention compliance, improving auditability, reducing the cost of e-discovery, and protecting themselves from litigation. Inherent in these solutions is a “data retention forever” strategy.
These solutions are not sustainable long term for three reasons:
- The cost of retaining all electronic records forever is increasing forever;
- The costs of e-discovery is directly related to the amount of data retained, and these costs are escalating exponentially;
- Data retention forever increases the legal risk by creating more opportunities for an adversarial counsel to re-write history.
The growing consensus is that organization should move from a reactive data retention forever strategy to a proactive strategy that allows the deletion of records when end of life-cycle is reached (see Stop reacting and start managing archiving and retention risk,The yin and yang of information risk management and Time to rethink archiving and retention strategies).
New technologies such as auto-classification and new architectures will be required to deal with the distributed nature of data and data communication, and bring data retention costs under control. Many of the centralized point solutions installed today will have to be replaced if they cannot be retrofitted to fit into these architectures.
Action Item: The costs of data retention are projected to spiral out of control, and IT in litigation-prone organizations must drive new technology and processes to ensure that almost all data can be deleted automatically. Organizational IT governance systems need to evolve to ensure that new projects are not approved or funded unless they incorporate the ability to get rid of data.