Storage Peer Incite: Notes from Wikibon’s January 11, 2011 Research Meeting
Recorded audio from the Peer Incite:
Is archiving broken? It may be more accurate to say that archiving is non-existent in many companies, and where it does exist it is severely crippled. The irony here is that strong archiving was a basic part of business operations throughout the long history of physical records. Archeologists have discovered 3,000-year-old archives of tax and business records in Mesopotamia and China. In the 19th and 20th centuries, businesses and government agencies maintained large staffs of archivists -- they were called file clerks and librarians. When desktop computers replaced typewriters that organization withered, presumably replaced but with what -- documents created and stored on someone's hard drive that disappeared when that hard drive eventually died?
IT departments developed tape-based backup systems that supposedly preserved everything, but actually getting at them was hugely difficult and time-consuming: No one knew what was on a given tape, often vital tapes were lost and those that were found might or might not be readable. It was literally easier and less expensive to try to recreate the document manually by repeating the research than to try to recover it from backup.
The problem became acute in the late 1990s when the legal system discovered the value of electronic records. Courts started issuing subpoenas for electronic documents in legal torts and governments were passing new business regulations in the wake of Enron and other business scandals requiring the preservation and presentation of electronic business records. Businesses suddenly realized that they needed to know what the thousands of formal and informal electronic documents they had contained, where they are, and how they can be produced when needed.
Today we live in the first generation of electronic archiving that grew out of a turn-of-the-century legal panic driven in part by high profile business torts in which large companies had to pay huge fines for failure to produce required documents promptly. Businesses have implemented document search capabilities that their legal departments can use to find the documents they need, and they now think the problem is solved. Problem solved -- right?
Wrong! While the immediate emergency may have been met, businesses still have no idea what they have and nothing has been done to organize the huge and fast-growing collection of miscellaneous documents that drive their businesses. They are spending large sums preserving Tbytes of information, much of which may have little value, while the documents that can provide keys to business opportunities that could drive new or increased revenue streams or at the least avoid costly duplication of effort remain hidden and unavailable to those in the organization who need them. Enterprises need a second generation of archiving tools and methodologies to organize these documents, separate wheat from chaff, and make them available to the people across the organization who can use them. This newsletter, based on the January 11, 2011 Peer Incite meeting, explores some of the benefits and issues involved in establishing a true electronic document archive in an organization. G. Berton Latamore, Editor
Companies are drowning in a deluge of documents to the point that employees can only read a fraction of the documents that pass through their computers and cannot find the information they need, and companies have no idea of what they own or what they should do with it. That is the message that Joe Martins, Managing Director of Data Mobility Group, brought to the January 11, 2011 Peer Incite Meeting.
Too often, he said, companies try to deal with these documents by backing up everything, archiving the tapes and hoping they can find what they need when they need it or by saving everything in multiple locations. This is an expensive solution when dealing with multiple terabytes of documents, only a fraction of which really need long-term preservation. And it is courting disaster, as some enterprises have discovered when they were unable to locate critical documents required in court proceedings.
And too often all companies are concerned about when archiving these documents is meeting compliance and potential civil tort needs. But beyond these immediate concerns, these documents often contain valuable information that used properly can drive revenue and operational improvement. But that cannot happen if the information cannot be found when it is needed, and a simple “save everything” strategy does nothing to solve the problem of finding the right document at the right moment.
What companies need, Martins argued, is an intelligent archiving system that uses metadata to identify each document according to its value, handles each type of document – report, slide deck, photo album – in an optimal way to make it available to those who need it across the enterprise, preserves them for the length of their effective life and destroys them when the cost or legal liability of keeping them outweighs their value. The problem, Martins says, is that this is not easily done and can only be partly automated, and then only after careful planning.
“When I go into a company,” he says, “I find IT around the table and sometimes some executives.” But companies have groups of specialists who use specific types of documents and are the only people who really understand which of those need to be archived for preservation and how they need to be treated and organized for optimal retrieval. Those people typically are not represented at the planning table, and without them IT can only make guesses. The result too often is that when a solution is installed and goes live these groups turn up to complain that it doesn't do what they need.
Even with careful planning and an optimized system design, however, even the best system can only partially automate the creation of that metadata. People – the creators and users of the documents – have to supply some of that, and people as a rule resist that extra task added to their jobs.
“I have seen cases where people have entered random keystrokes into the fields for the metadata for their documents,” he says. “Then later they cannot find the document they need. When we find the document for them, we see the random keystroke entries. When we ask why they did that, they say, 'It was faster.'”
Action item: Building an effective document archive requires careful planning that involves the specialized groups within the organization that create and are the primary users of different document types. It also, however, requires that the document creators and users be motivated to provide the metadata needed for effective archiving. They need to be educated to the importance of that metadata and motivated by the right combination of carrots (e.g., the ability to find the document they need easily and quickly) and sticks (e.g., the threat that lost documents can result in major losses to the enterprise in court cases, and that could rebound on them and damage their careers).
CIO Challenge: Finding Value in Archiving Solutions Beyond Regulatory Compliance, Ediscovery and Storage Management
Every archiving software, hardware and services vendor for the last several years has gone to market with one or more of the following three simple statements:
- Chances are your organization will be facing litigation in the future.
- Retention policies are a must in any organization.
- Storage costs are out of control.
While compliance, litigation readiness and decreasing costs are all critical business drivers, and archiving solutions have proven to be effective in mitigating these risks, too many organizations are missing an opportunity to leverage their information assets by not extending their information management tools beyond departmental needs. Wikibon guest speaker Joe Martins, Managing Director of Data Mobility Group, led a discussion on the topic of "Is Archiving Broken" during the January 11, 2011 Peer Incite Meeting where this issue was discussed.
The vast majority of archiving vendors thrive on organizational chaos accompanied by fear, uncertainty and doubt (FUD). In particular, point solutions vendors focus on the amelioration of a particular pain point such as the cost of the ediscovery review and collection process or optimizing storage through deduplication, compression or single instancing features.
These products or features all decrease costs in the areas they address but may add operational expenses through poor integration with existing processes and systems or the need for additional IT support and intervention. Without an IM plan, organizations can manage their expenses but not easily leverage these applications for other purposes.
Most user organizations lack a cohesive Information Management (IM) strategy, and if they have one at all it usually falls on the CIO organization to define and operationalize the IM strategy for the entire enterprise.
A big part of the problem is senior management typically has not bought into the potential benefits of implementing an archive. Rather, archiving is viewed not as strategic but as a necessary evil to help ward off litigation, support regulatory and compliance requirements, and reduce storage costs. CIOs and their staff may also be responsible for not identifying opportunities and making the proper business case.
Finding New Archiving Opportunities
Forward-thinking organizations are leveraging their archiving solutions that may contain a treasure trove of information assets in various ways. For example, messages between employees and customers archived for compliance purposes can also provide valuable insight for customer service, marketing, product development, and sales departments to improve customer satisfaction and retention as well as drive revenue. Information archives can further enable the use of tools such as sentiment analysis, content analytics, and enterprise search because archived content is indexed and often very usefully classified.
A Central or Unified Archive, whether logical or physical, cloud based, in-house, hosted or hybrid, can, if implemented correctly, dramatically reduce ediscovery costs by streamlining the collection and review process, assist in optimizing storage through stubbing and single instancing of messages, and improve the time it takes employees to find content for any purpose.
Archive versus Backup
CIOs should not confuse archive with backup. Many organizations that have not deployed enterprise information archives and have faced court- or regulator-mandated ediscovery requests for electronically stored information (ESI) have found that savvy judges and lawyers will demand ESI be produced from backup tapes. This is the most expensive and time consuming process for producing required electronic content.
One large financial services firm estimated it used to take them more than 300 hours to retrieve ESI from tapes, file shares, and other sources for a single request. Today, after implementing a unified archive, that same process takes approximately one hour. For those organizations that still haven’t implemented an archive, forensic services are available at a premium to meet legal or regulatory requirements, and point solutions also exist to meet specific needs.
Most vendors don’t want buyers to develop a strategy for managing their information assets as they view strategy as an impediment to making a sale. Any comprehensive IM strategy must include an archive, whether in-house or hosted. Whenever possible archived data should be shared across the organization, not just coveted in silos by the departments that provided the initial funding. Click here for an organizational IM checklist
Action item: CIOs need to look beyond the traditional justifications for implementing archiving solutions of compliance and litigation support or storage optimization and make a case for why and how archiving solutions can be leveraged across the enterprise to support new business opportunities, improve customer service and employee productivity.
Footnotes: For additional thoughts on this topic read Gary's blog Plan Now for the IM Solutions Future
Clairvoyance isn't a job requirement in IT, but perhaps it should be. Today's IT managers and their many applications struggle with the unenviable task of guessing the ownership, importance, relationships, and value of their organization's digital assets because business personnel/applications share little information with the back-end IT processes--such as backup and archive--that support them. Consequently, creating an archive that contains mostly valuable assets with as little garbage/waste as possible is unnecessarily difficult and costly.
In other words an archive is only as valuable as the information selected to be stored in it. Without sufficient information sharing many organizations are left to guess at what should be archived and what should be discarded. Too often they choose to "save everything"--a strategy than can be even more costly in the long-run.**
Business applications are rich with precisely the sort of metadata that would enable IT to shift from guesswork to fact-based asset management if only the data were more easily shared between business and IT. Sharing is achieved by way of openness both in the business application layer (to enable metadata pass-through to back-end processes) and in the storage layer (to customize/configure an archive to accept more than just individual files).
Going forward, repositories should be designed to be as open and extensible as possible to accommodate the diversity of information assets and metadata. Business application developers must take responsibility for their designs and make it easier for information assets and corresponding metadata to flow down into infrastructure applications to be culled, backed up, and archived.
However, it is up to business and IT stakeholders to determine which information must be shared/stored and identify/adopt the business and infrastructure applications capable of sharing/storing it. Only then will IT managers have the detailed insight necessary to ensure that their organization's archival resources aren't wasted.
Action item: Organizations must first identify the types of information assets and value-enhancing metadata that should be archived. Stakeholders are the number one resource for this information. At least one representative from each group within the company should have an opportunity to make a case for the data the group believes must be archived (or discarded) and why. It is then up to the organization to identify and transition over time to business and infrastructure applications that fully support its diverse information assets and metadata from its front-end down into its backups and archives.
Footnotes: **Imagine terabytes and eventually petabytes of non-value-add information assets consuming precious storage resources. Consider the CAPEX and OPEX impact of archiving information waste for any length of time, and its long-term operational artery clogging effects.
During the January 11th Peer Incite meeting to discuss the topic, “Is Archiving Broken”, the Wikibon team along with guest speaker Joe Martins made the case for why archiving solutions can be of critical importance to organizations trying to improve not only their compliance and e-discovery readiness but also for improving customer service, internal communications and employee productivity.
According to IDC the average worker spends almost 10 hours per week searching for and analyzing information. Organizations who are doing without enterprise-wide information archiving solutions or hosted archives – even if they have deployed point solutions for e-discovery and enterprise search - are probably losing many productive employee hours.
But to simply “react” to the need to implement an enterprise archive solution without properly planning ahead can also lead to wasted resources and marginal results. Wikibon recommends organizations follow a 10 step process, derived from the experiences and best practices of Information Management (IM) professionals who have successfully implemented solutions for their companies. (View this Wikibon How To Note for more detail)
- Recognize/Identify Problems
- Find Champions, Build Consensus
- Set Goals: Save Time, Money and Comply
- Do a Presentation for Executive Buy-in
- Create Cross-Functional Team
- Do Research, Develop a Plan
- Redefine Policies and Requirements, Create Budgets, Know your Scope
- Set Technology Direction (In-house vs. hosted) Auto-Classification
- Find a True Partner, Vet Vendors and their References
- Manage the Process, Test and Measure Results
Wikibon also recommends taking a survey of existing systems – you may have technology already in place that is not being used properly or leveraged across departments that can meet short term objectives while you put your plan in place. Developing a long term IM strategy does not preclude continuing to manage problem areas such as data sprawl, e-discovery and compliance requirements with existing resources.
Having clear business objectives for the project, executive buy-in, cross-functional collaboration and a strategic implementation plan in place before committing additional large sums of time and money to an archiving project and vendors or service providers will only improve the chances that the next generation solution will meet and perhaps exceed your organization’s expectations. Technology is an enabler of the business strategy.
Action item: Review existing solutions and capabilities to determine how best to manage the present state of IM readiness. Begin the next generation archiving solution search with the business need as the first priority followed by the development of a strategic IM plan.
Vendors selling archival solutions tend to focus on regulation requirements (SOX, HIPAA, etc) as a stick to get customers to purchase and end-users to meet compliance. One of the problems with this method is that the burden of tagging or creating meta-data gets thrown on the end-users without getting their buy-in or acceptance. Vendors need to do more than just sell the solution, they need to help customers adopt the technology in a way that is not onerous and where possible, to help add value to the company through revenue opportunities.
Museums have some of the most visible success stories in creating value from archives – see the recent JFK Digital Archive or Cooperstown. Engagements should include working with customers to fully understand the requirements and investigate value opportunities for long-term retention. This then needs to be broadly communicated to internal stakeholders, not simply handed down as a mandate. Cultural norms are difficult to change, but to be able to intelligently manage the environment and not simply extend all backup into the archive will require the buy-in of those that are closest to the data. Vendors have the opportunity to partner with customers to improve business processes.
Action item: Archiving is part of the Big Data trend. Vendors should help business move beyond seeing the growth of the retention as a challenge and facilitate utilizing information to extract value.
Many an email archive system in production today has been justified by the looming and pressing mandates of a law suit. The justification comes from the legal department based on cost reduction in the discovery process and reduction in expected loss from the law suit. The archiving and IT community know that the implementation is a kluge and of no use to any other department, but that will be fixed "later".
There is a compelling argument that "later" should be much sooner. The ongoing cost of keeping the (say) email archive in pace grows each year as the number and size of emails grow. That data cannot be used by other departments to (say) track customer satisfaction.
Perhaps most important of all, there an emerging breed of big data approaches using very different tools and databases that are unlocking the value in data from all sources, including archived data.
Action item: The archiving teams and senior management should fight to avoid archived data silos, especially those justified only by fear and compliance. Previous data silos should be eliminated as soon as possible and subsumed into general archives that meet the need of the organization as a whole within a specific industry. Justification should be based on value creation, and the ability to work with big data tools such as Hadoop.