Wiki Trends:
In 2010 Information Management Solution providers will battle over who “owns” organizational data. IM solutions merge structured and unstructured content ecosystems as customers seek tools to support information access across all major content repositories.
By 2012, 75% of enterprise IT organizations will have implemented strategic data reduction initiatives designed to reclaim wasted storage space, reduce risk, and eliminate unwanted data.
Due to the rampant growth of enterprise data and ever expanding requirements to ingest, analyze, and incorporate various forms of content from the Web and other outside sources, firms of all sizes in every major industry and every market segment as well as the public sector are experimenting with and implementing solutions and services to help manage the entirety of their corpus of electronically stored information (ESI).
The vast majority of enterprises have implemented point solutions, product suites, or vendor services to address significant data management bottlenecks in order to meet critical business needs for compliance, customer service, e-discovery, or regulatory workloads. However, most firms have capitulated under the shear volume of data and are no longer willing or able to manage all of their data effectively, exposing them to a variety of risks including fines, lawsuits, lost business, and higher IT costs.
To illustrate this point, the following excerpts were gleaned from user conversations and presentations that occurred at the IQPC eDiscovery Event in New York City, December 2009:
- "We’ve given up trying to index everything. We can’t even keep up with the volume of new data that is being created. We store what we consider non-essential records without indexing and hope we never have to access them.” - IT Director, Multi-National Financial Services Firm
- “Keeping everything is not a good approach. Technology alone cannot solve the problem. Records management, compliance, and legal should work together – even if they may not like it.” - Senior Litigation Counsel of multi-billion dollar manufacturer and Director of Records Management for Fortune 500 services company
- “Research indicates that only 22% to 57% of relevant documents are found in even well constructed searches. Users need to understand the risks and benefits of different searches and need to refine their search terms.” - General Council, regional healthcare facility
- “We found over 1,200 copies of a single document in our various repositories. In addition, we found that 40% of our data is exact matches. Yet I’m not convinced we found all copies, as I cannot know for certain that we are able to search our entire data set effectively.” - CIO from Pharmaceutical industry
The above examples could easily apply to many different use cases and, sadly, for the last several years information governance best practices have taken a back seat to the compliance and litigation requirements of corporations and the growing need to “quickly” derive knowledge or insight from vast amounts of data. If, as IT has learned, retaining and indexing everything is not a viable option, and proliferation of siloed data among disparate repositories in structured, semistructured, and unstructured formats accessed by multiple applications that do not interoperate well or at all has become an untenable situation, then users must amend their practices, and solution providers must evolve their products and services to ameliorate this situation.
PreDiscovery Tools
While most organizations still lag behind in adopting or updating their information governance best practices, many vendors and service providers have been busy developing solutions for “chunking” through vast amounts of data in order to determine its value. Most IT professionals are familiar with and deploy compression, deduplication, single instancing techniques and capabilities to reduce their data storage requirements. Business intelligence (BI) platforms are adept at working with MDM, ETL and advanced analytic tools are widely adopted by IT and knowledge workers. In addition, stream computing and analytics are being applied to the task of quickly evaluating large amounts of web based real-time data and content.
Meanwhile, IBM, NEC and smaller, focused services firms such as Collabera and DCL are offering services or rolling out products next year that address the “find” or pre-discovery phase. These tools and services provide their customers with the ability to review content regardless of the structure or its internal or external origin. IBM’s solution offers users an "understanding of the business value of content and supports dynamic analysis of content by business analysts and allows the user to preserve data in place, classify, decommission, or even generate a taxonomy while working within a federated or consolidated repository framework." While the capability was announced at IBM’s Information on Demand event in October 2009, full product details will be available in Q1 2010.
Bottom Line
Enterprises retain far too much data that is redundant or irrelevant. Both save everything and index selectively approaches are unsustainable and expose the enterprise to unnecessary risks. Improved policies for the creation and disposition of ESI along with deployment of compression, deduplication and business intelligence, analytics and pre-discovery assessment tools can help control data creep and ultimately reduce risk, improve performance and lower costs.
Action Item: IT professionals along with compliance, legal and records management stakeholders need to perform data audits or assessments that go beyond the initial compression and deduplication capabilities they are most familiar with, as this functionality does not determine relevancy or criticality of data as valued information and traditional BI tools have been largely focused on structured data. Users must review vendor pre-discovery, find or content assessment capabilities and determine their suitability for their IT environment and business requirements.
Footnotes: