Eighty-five percent of storage spinning in the data center is unstructured. Most IT organizations have storage infrastructures and processes that reflect the optimum way of providing good response times, throughput, availability and business continuance for tier 1 transactional systems. Applying those practices to unstructured systems is overkill.
Storage in high-performance arrays (such as EMC DMX, Hitachi USP, IBM 8300) cost ~$20 per GB. Modular storage costs $10-15. New entrants such as 3PAR have slightly lower costs but use the same model of proprietary hardware and software. The raw cost of storage is less than $1 per GB. Clustered storage arrays, based on commodity storage and system components, can have much lower price points than high-performance arrays, as Google, Amazon and others have demonstrated. These clustered storage systems will have different functionality than traditional arrays that will facilitate the execution and operation of Web 3.0 applications.
One of the biggest challenges to IT is developing a strategy for managing unstructured data. This data represents both a potential liability from litigants and a source of value to the organization.
One thrust for managing unstructured data has been to try to develop classification taxonomies, either by user input, by automatic classification inference methods, or from available metadata. Another thrust has been to remove duplicate data to lower the cost of storage. The success to date of these approaches has been mixed.
Another approach that may have broader applicability is to simply use very low cost clustered storage systems and use search and classification based on how data is accessed. This is a much simpler way to extract value from the data and much easier to implement. For users it would be a natural extension of their web experience.
Pooling data and creating an ecosystem with partners, suppliers and customers could further enhance the value of unstructured data. Combining this with data from the Web would lead to natural extensions of these search applications and the development of new ones, dynamically adapting to how users exploit the data to drive business value rather than using a preset classification scheme.
Action Item: IT organizations should continue to invest in traditional arrays for transactional systems. IT organizations should continue to implement systems that reduce the major risks that may exist in unstructured data with simple archive systems. However, IT organizations should minimize their investments in the exploitation of unstructured data and make no major changes in unstructured data management until distributed clustered storage networks become generally available for use in 2008 and onwards.
Footnotes: