Storage Peer Incite: Notes from Wikibon’s June 12, 2007 Research Meeting
This week, we observe that storage management is facing a major evolution in response to pressures like the introducton of new technologies and the increasing need to recover full business records made up of multiple interrelated documents on demand. Legal requirements, business recovery needs and other related issues are shifting the storage emphasis from bits on a platter and provisioning to the capture of increasing amounts of new data and the assured recovery of complete business records. This includes both structured and unstructured files that together provide often very valuable information and further will require a new set of skills. No clear answer to the organizational and computing issues involved in satisfying these new needs exists, but companies in some industries are already developing their own solutions, and that trend will grow over the next five years. Organizations that do not pay close attention to this new set of demands in the business environment do so at their own risk.
Driven by significant change in storage technologies and business requirements, we are starting to see the kind of organizational tension in storage functions that heretofore has been identified mainly with applications development, network development, server, and security groups in IT organizations.
Until now, storage administrators were protected from this pressure mainly because they encountered business change in the form of demands for new storage at better price points with different performance levels. However, with the introduction of storage services enabled by new technologies such as virtualization, data deduplication and thin provisioning, combined with heightened business requirements for assured access to business records to support compliance, disaster recovery, and other demands that directly impact how storage is managed, storage administration is facing significant change.
Exactly how this combination of traditional and new storage activities will be organized is unclear. But whatever organizational form emerges must accommodate some new realities inside the management of information. This new function in the organization – which might be termed “records management” since it involves organizing information as complete business records, although this should not be confused with the often microfiche-based records management of the past – will have several responsibilities. The first of those will be the introduction of policy that fundamentally shifts the focus of storage management from the acquisition and implementation of storage assets for capturing new data to issues surrounding retrieving and reusing coherent business records, often made up of multiple components from different sources. An example might be a business transaction with all associated e-mails, memos, background information and other relevant records.
This new emphasis on the access of business information with all its metadata intact – and in a form that can be reused by new software after the original applications that created the data are replaced – for compliance, business recovery and other reasons, rather than just the bytes showing business transactions, has been growing for 10-15 years.
The new responsibilities of records management focus on three types of policy:
- Transitioning the view of data from bits on a platter to business records. This includes better storage-level metadata management and classification activities.
- Designing and implementing business systems for business continuance – disaster recovery and other considerations to ensure organizations can access the state of the business in the event of continuity-threatening contingencies.
- Defining when and what data is deleted and what is maintained.
These groups of policies will form the core of the record manager's tool-kit. They will also become the touchstone for identifying information assets and liabilities as information artifacts rather than just application, data, and device attributes.
Over the next few years different enterprises — driven by such issues as size, degree of information-intensity in the business, and regulatory environments — will pursue different paths to achieve this record-management capability inside their organizations. We will see different alignments within IT — for example between database and storage administration – to better accommodate the realities of overall information management at a schema level. While this will move at varying speeds and may take different forms in organizations, in the next five years we will see the evolution of formation of new procedures for records management in business.
Action Item: Storage administrators must recognize that many of their traditional practices will be automated by improved tools for configuration and allocation. However, this does not herald an end to the requirement for smart storage management. Rather it creates the possibility that professionals focused on storage can start aggressively learning new skills and performing new practices in response to the increasing business need for true records management.
For several years, the growth in unstructured data and information has been more rapid than structured data, driving demand for unstructured content storage. Moreover, because they contain highly structured information, corporate systems such as ERP, financials, CRM, supply chain, etc. can be credibly used to indisputably recreate a sequence of events such as who placed the order, for what, for how much, when and on what terms. Unstructured information such as emails, documents, spreadsheets, audio, video, etc., on the other hand, are masses of stored information where it's much more difficult to find what's relevant and replicate a decision flow in a manner that is provable with 100% certainty.
This represents a huge liability for organizations as information uncertainty grows daily and exponentially (see Figure 1).
Organizations must operate on the assumption that whistle blowers, opposing lawyers or even unwitting employees will gain access or leak information that can be extremely damaging. Indeed, knowing what to keep, where to put it, how to relate it to other information, when to destroy it and how to prove to a court that it's all been done above board is a challenge that if not done properly can land a company's CEO in jail and/or cost an organization literally billions of dollars.
The implications of this trend are enormous for organizations in general and IT groups in particular. Specifically, as the management of unstructured information becomes a corporate imperative driven by legal and compliance, business lines will naturally fight to protect and expand their turf and push for expenditures that not only cover the board of directors, but ultimately add value to the business. CIO's must understand this dynamic and respond to the various constituencies that on the one hand demand that corporate liabilities are minimized and on the other that assets are leveraged to a very high degree.
Action Item: CIO's should, in the near-term, partner with legal and sell the board on the need to catalyze action and put in place systems to reduce the huge liability posed by unstructured information growth. Organizations should use this opportunity to gain experience around processes and procedures (e.g. auto-classification) that are applicable to unstructured information management so that over time, they can be applied to create tangible business value (e.g. via improved productivity, information mining, etc.) and satiate the demands of business users.
Projecting forward to the year 2012 and looking at a best-of-breed records management system, it is easy to see the benefits derived from reduced corporate exposure and improved productivity. With regard to an organization's ability to recreate a sequence of events, decisions and transactions, there is no difference between structured and unstructured information. All pertinent information surrounding a business process can be readily accessed.
Systems are in place to identify and eliminate “rogue” elements that could be used by opposing counsel. In addition systems can differentiate “work-in-progress” records (e.g. stored chat about a potential liability) from final outcomes and eliminate extraneous and potentially damaging information. Internal audit functions have been radically simplified and enhanced by the improved access to information. More accurate institutional memory and access to information has led to significant improvements in business processes, although there are still some complaints from lines of businesses about the overheads of establishing and complying with record creation policies.
Action Item: CIO's must work with the business to sell the imperative of records management justified on the basis of risk reduction (initially) and improved productivity (longer term). IT governance committees should include representation from the CEO and general counsel that support the creation of a cross-functional team, with a dedicated champion, to focus on records management. Business functions must collectively agree that the reduction in their freedom to operate is easily offset by the benefits of this initiative.
The traditional scope of storage influencers, recommenders and approvers is changing. As the saying goes, "there's a new sheriff in town," and its name is records management. Not the traditional microfiche-based records management that's been in the organizational basement for years, but an emergent capability within corporations that defines retention and classification policy, articulates legal and business requirements and coordinates the activities of cross-functional teams responding to directives from the board room. This new function is catalyzing the development of infrastructures that provide speedy access to unstructured data and encapsulate a series of data blobs into an information asset.
Here's the good news for storage vendors. In order for this vision to become reality, tons of metadata will need to be created to support this effort. Probably more metadata than data, which means more storage. The bad news is that based on marketing messages and selling behaviors, it's becoming clear that very few established storage vendors understand this trend and even fewer are in a position to develop associated storage services that actively exploit an emerging ecosystem of policy engines, search, data classification, document management systems, databases, applications and associated services.
Action Item: Storage vendors need to set forth a vision and define a role in the new records management imperative. Suppliers must add value from marketing through technology integration by developing solutions that automate the reduction of risk and exploit metadata that can be applied to make unstructured information a more valuable asset.
As we approach the fifth anniversary of the passage of the Sarbanes-Oxley (SOx) act, businesses continue to struggle with the implications of vouching for both the integrity of reported financial data and the information systems that generate that data. After paying enormous fees to accountants and consultants to perform complex SOx-related audits, the truth is that most businesses – perhaps 90% – are still incapable of applying and enforcing critical record management policies relating to data storage, such as identifying information assets and liabilities, structuring individual data items into meaningful business records for both archiving and deletion purposes, and assuring business continuance. The challenge stems from the fact that record management practices require a join on multiple business functions, including IT (storage administration and application management), business administration (clerical work related to operations), and legal (translating legal vulnerabilities into flow-of-data practices). Without direct CEO participation, an effective organizational response to these issues will not be possible.
Action Item: CEOs must catalyze and shepherd the organizational interplay required to forge an effective records management capability in businesses. Only then can the appropriate records management mission and system of authorities emerge to fully satisfy the business objectives and goals born from SOx.
Today structured operational systems maintain records glued together by explicit and implicit metadata. The metadata in systems that store unstructured data is sparse and consists mostly of a few dates and maybe a file name; the exception is email where some standard metadata exists.
The requirement for creating a complete and accurate business record is to combine both structured and unstructured data. A simple example is providing a special price for a client order.There are numerous unstructured emails, excel spreadsheets, signoffs and other data elements that may be combined with structured data entries to form a more complete system of events. An effective record management system allows these elements to be connected together to provide an unambiguous record.
For records to be managed around key events in business processes, more extensive metadata must be created with a combination of automatic and user inputs. There are no perfect technologies that can retrofit all metadata into existing unstructured data, but it should not be assumed that none will be developed (see discussion tab). Email has some of the metadata required, but is still incomplete. The cost of manually adding metadata is prohibitive and unlikely to take place in most cases.
This means that the essential prerequisite technology is the ability to tag new unstructured data with metadata automatically when the data is created or susequently used. This will require extensions to applications that create and store unstructured data.
Many additional technologies will be required to create effective and high performance records systems. The ability to encapsulate both the data and the metadata together is one such interesting technology. However, this and other inventions are of little value until the metadata is first established.
Action Item: CTO's should make sure that executive management do not assume that metadata can be applied retroactively to existing unstructured data, and point out forcefully that the legal exposure is growing daily in proportion to the amount of data being stored. Implementing technologies that have the ability to create metadata at the time data is stored will be the key determinant of when this exposure can be brought under control.
Retrofitting metadata to the huge and growing accumulation of unstructured data -- e-mails, IM sessions, notes, memos, etc. -- that companies are accumulating is one of the major challenges facing organizations as they attempt to create full, organized records from the data on their hard drives.
The underlying technologies supporting enterprise search and natural language processing may form the foundation for building a consistent and defensible (not necessarily perfect) capability to automatically classify the growing corpus of unstructured data owned by organizations today.
In many ways this represents the Holy Gail for records management as it will remove the end user from the equation. However the benefits of such a capability are far larger than risk mitigation through improved records compliance. By being able to effectively segregate and treat like unstructured information in the aggregate organizations will be better able to:
- Find the specific information they require.
- Create new connections and draw inferences across large data sets.
- Free end users from the burden of organizing their information.
- Reduce storage costs.
- Defensibly delete the junk and keep information around for only as long as it is either relevant and useful or required by regulating authorities.
- Leverage emerging Enterprise Digital Rights Management to quickly and securely share information outside the boundaries of the organization.
- Focus security efforts on the high risk information and not worry about the rest.
Action item: Ideally going forward tools will tag / classify information objects at the time of creation, e.g., an extension to the file / save as process, and store / organize them virtually according to content, use, value, retention period, etc. Under this model the traditional hierarchical files structure disappears, replaced by a content-based semantic structure that is automatically built and maintained by the system versus the user.