Contributions by Michael Versace
In 2010 Information Management Solution providers battle over who “owns” organizational data. IM solutions merge structured and unstructured content ecosystems as customers seek tools to support information access across all major content repositories.
Organizations struggle with the sheer volume of data they have created and stored in all of its forms; structured, unstructured and semi-structured. The stewards of enterprise data need to protect, manage and transform data into a trusted, valued business asset while decreasing their costs to do so. With the prospect of unstructured data growth alone doubling in the next 18 months, in the form of audio, documents, emails, images, video and Web content, information governance has never been more critical.
Information Governance is what happens when compliance meets information management. In other words, it's a combination of business practices, technology and human capital for meeting the compliance, legal, regulatory, security requirements, and organizational goals of an entity. Information governance provides a means to protect, access, and otherwise manage data and transform it into useful information.
The financial services, energy, and pharmaceutical industries have massive amounts of data to manage for regulatory and litigation purposes but also rely heavily on the ability to access information for product development, marketing, and systems monitoring. Government agencies also need secure, fluid access to information for a variety of reasons.
While applying best practices such as physical and electronic security measures as well as creating policies for the disposition of data are critical to implementing an information governance strategy, available technology solutions and services can play a key role in several areas including:
- Integration and Interoperability: Enterprise data resides in multiple repositories associated with archive (active, passive and deep), CRM, collaboration, ECM, and RM solutions as well as databases, mail servers and assorted file stores. The ability of these systems to share data or make their indexes and metadata available to other systems is critical as the combining of data sources to form a unified view, for example, of a customer based on email correspondence, a signed contract, demographic information, and the like is critical.
- Find, Index and Search: The ability to access all repositories – and perhaps desktops as well – and include incremental index updates, requires the building or deployment of connectors, agents, or crawlers, depending on the approach. You can’t search what you can’t find and index. Enterprise search purists continue to argue the merits of federated vs. unified searches and the overhead necessary to maintain the indexes. However, the fact is that if a solution’s index is not accessible, or worse, a repository doesn’t have an index building capability, information governance is severely hampered.
- Auto-Classification and Data Categorization: The ability to automatically classify data into categories and even merge an organization’s existing categories with new ones is promising but has met with mixed results. Users are skeptical that technology can provide the business logic and contextual framework to create meaningful categories. On the other hand, to attempt to perform this task manually is often overwhelming. Auto-classification needs to be a consideration for enabling time-to-information or insight.
- Information Assurance and Privacy The ability to establish, enforce, and monitor polices for ensuring the confidentiality, integrity, and availability of properly classified and categorized data. These capabilities must be enabled during or shortly after classification and categorization in order to be cost effective and functional across heterogeneous compute, network, and storage infrastructures.
- Disposition and Information Insight: Once an organization can be assured that all of its data is found, secured, indexed, and classified, then policy (archive, delete or encrypt) and some level of data compression (single instancing, deduplication) can be applied - if it hasn’t been already. Some solutions providers argue that older data that is not subject to retention requirements or does not contain potential information assets should be deleted prior to the indexing and classification process. Regardless of the approach, a process for getting rid of stuff (GRS) is vital.
Major debate also rages regarding whether or not it is better to manage data in place or migrate it to a single repository. The argument for a single logical repository is better control and a single policy framework for managing data, while the argument against is the cost and time it takes to migrate large data stores. An approach that says leave data where it is must rely on a large index and metadata repository and the ability to sync up with disparate indexes and policies. Which approach to take will probably depend on the volume and complexity of an organization’s data, policy, or compliance requirements, and the number of repositories.
Where To Turn
Fact: There are no end-to-end information governance or management solutions in the marketplace today, only a collection of software suites, point solutions, and services. Even for organizations looking to tackle a single problem such as email, the files in that system need to be “discoverable” by other systems such as enterprise search, policy or records management, and litigation software as well as BI and analytics tools.
Most enterprises with monolithic silos of data are not keen on being in the integration business and seek the support of solutions providers who can bring their own products and services along with partner solutions and demonstrable thought leadership to the table. Typically these solution providers are firms with trusted relationships, proven expertise and large services organizations such as EMC, HP or IBM as well as lesser known firms such as RSD.
As an example, HP has a suite of Information Management products that include archiving integrated with records management (RM), an e-discovery suite complemented with partners such as Clearwell, BI and data warehousing tools, data protection, an ECM solution through the Tower Software acquisition, integration capabilities with SAP and an array of services to support complex implementations. In addition, it has an interesting content intelligence technology, Taxonom, which allows users to create custom taxonomies in near real time to classify data from multiple sources including internal repositories and the Web provided in a SaaS model.
In the case of enterprise search, or information access technology, support and innovation can come from smaller firms such as Coveo, Endeca and Vivisimo. Coveo has a unified search approach where it ingests the repository indexes that it links to as opposed to a federated approach taken by the others. They all compete with Google (GSA), Microsoft (FAST) and IBM (OmniFind)
Information Governance and the enabling technologies that support it are evolving. Proven solutions such as archiving, data protection, and reduction, records management, e-discovery, and enterprise search when implemented and used correctly can speed time to knowledge, help meet compliance and regulatory requirements, reduce the risk of keeping data too long, and lower costs by improving storage utilization and more efficiently culling and securing data. Emerging technologies such as auto-classification, advanced enterprise search or information access and taxonomy building hold the promise of helping users to reach relevant data more quickly and in context. Integration and interoperability of systems and data stores are of paramount importance.
Action Item: Users need to implement or improve information governance programs to include business best practices gleaned from legal, IT, records management and other intra-company disciplines and marry those with the best technologies available today. They need to act while knowing improvements are on the horizon and that no one vendor can provide a truly end-to-end solution from only its own offerings. Look to trusted vendors who can integrate solutions and include innovators who offer advanced features that will support your information governance strategy.
Footnotes: Functionality Sets for managing unstructured data