Originating Author: David Vellante
Guest Analyst: Michael McCreary
In the past five years, corporations have come to realize that the exposure posed by unstructured data generally and email specifically required fast action. Organizations responded, as did technology vendors, which seized the opportunity to aggressively market solutions designed to plug gaping holes and forestall immediate threats. This kneejerk reaction, while necessary, precipitated a mentality in many organizations to “archive everything.” The approach has brought with it three pressing challenges for users:
- Storage costs have gone through the roof, with some organizations reporting that email archiving and legal discovery are the fastest growing consumers of capacity in their organizations.
- e-Discovery costs are exploding because discovery is a volume-driven activity and more information means more lawyer time (and bills).
- Hidden risks are mounting. Specifically, by keeping documents that are not required for legal or compliance reasons to be preserved, organizations are exposing themselves to opposing counsels using this retained information to re-write history in a legal battle.
Further complicating risk factors is the fact that today's employee is highly mobile and works daily with hundreds of messages, files, database records, and websites, all accessed through various computers and mobile devices. In order to manage and secure these decentralized volumes of data, new tools must do a defensible job of automatically understanding document content and context in order to make lifecycle decisions. Moreover, managing documents at the point of creation, where the content lives, with minimal disruption to users, is critical.
Next Generation Architectures
Recent advances in search, indexing, categorization, and document management technology promise to make it possible for companies to move beyond the practice of shoving everything into a centralized archive, to a more targeted business capability that, for example, ensures documents that need to be retained during a legal hold are secured and those that have reached their end-of-life are fully and defensibly deleted.
A key enabler of this approach is auto-classification. While this is by no means a solved problem, algorithms such as Probabilistic Latent Semantic Indexing (PLSI) and Support Vector Machines (SVM) produce accurate results comparable to those generated by humans. The goal is legal defensibility, not perfection, and this is the standard legal groups need IT to deliver.
At its February 3rd Peer Incite, the Wikibon community heard from Michael McCreary of Rational Retention who laid out the major components of next generation information risk management solutions. The key aspect of the approach articulated is the recognition that risk is decentralized and managing distributed assets is a fundamental requirement. To do this, nextgen architectures will continue to rely on centralized metadata repositories while deploying tools to identify, auto-classify, move, and prevent or enforce deletion of data. As well, a policy engine is required to enforce document life-cycle management at the distributed source where content resides.
Importantly, this architecture requires an agent-based system with endpoint discovery tools. Users need to ensure these tools are mature, reliable, and do not disrupt existing operations.
The ROI
Systems installed in the early part of this decade are largely turning out to be stopgap measures, and recent investments will show minimal obvious business payback. The ROI has been to reduce the risk of getting sued. While many IT managers communicated this fact early on, it’s likely that some in the corner office are unaware that their multimillion dollar insurance policies are aging rapidly and need to be overhauled.
Next generation architectures will require investments in software, services to assess requirements, establish policies, etc., and possibly network upgrades to effectively deal with remote offices and distributed locations.
The payback will be seen in two main areas:
- More efficient use of storage;
- More efficient discovery resources.
While Wikibon has not completed an extensive ROI analysis, it’s hard to believe that storage cost reductions will offset investments in new systems. As well, even with a decentralized management approach, discovery costs are likely to continue to escalate as data volumes rise. The ROI here will be a story of “what would the costs have been if we didn’t act.”
Action Item: Current email archiving and e-Discovery systems are proving to be too expensive, hard to scale, and overly complicated. Organizations must draw on recent experiences to re-assess organizational risk holistically, identify the costs across legal, IT, lines-of-business, audit, etc., and begin the process of defining next-generation architectures. The key will be to leverage current centralized processes and best practices where appropriate while adapting to the new reality of distributed risk management.
Footnotes: Archiving & Retention Risk Management - a White Paper by Michael McCreary