You are driving with the family on a long journey at 7p.m. Your dashboard computer displays a selection of restaurants and hotels that meet your budget, culinary preferences, and location, with a special offer for a family room. There is an attractive offer from a hotel if you drive another 20 miles. Behind this display is derived from a large amount of data put into context – and the only way to provide such information cost effectively is to use metadata inferences in real or next-to-real time.
Metadata, the data that describes data, becomes an imperative in the world of “Big Data” and the cloud. As more of the data is distributed in the cloud and across the enterprise, the model of holding central databases becomes less relevant, especially for unstructured and semi-structured data. Moving vast amounts of data from one place to another within or outside the enterprise is not economically viable. It is faster and more efficient to select the data locally by shipping the code to the data, the Hadoop model. Good metadata is a key enabler of this approach.
There is already some metadata in place; files have a date created/modified and file size, JPEGs have data about the camera settings and location, and there are many other examples. But metadata standards are fragmented and incomplete, and cracking open files to investigate properties requires too much compute and elapsed time.
A paper by Tom Coughlin and Mike Alvarado entitled "Angels in our Midst: Associative Metadata in Cloud Storage" is an interesting attempt to put a framework model (Figure 1) in place for metadata. The authors have taken an OSI-like layered model, split into to major components:-
- Basic Data Levels – four layers that focus on traditional metadata
- Meaning Data Levels – three layers that focus on meaning and context
IT organizations and vendors should recognize that completely new models of doing business are evolving that are enabled by an effective metadata model that has industry acceptance. Within IT, metadata can be used to assist in deleting data, as well as enabling more effective utilization of data value. Current methods of inferring metadata retrospectively are inadequate.
Metadata should be captured as close as possible to the time that the data is created or accessed, and there must be automation with immediate override in the capture of metadata. To meet national and international concerns about privacy, metadata must include strong access and security controls, with an emphasis on the user override (what Coughlin and Alvarado describe as a “Guardian Angel”).
Action Item: There should be strong cross-industry support from ISVs, hardware vendors and users for the creation of metadata models and standards. Apple, Google, Microsoft and other software developers of semi-structured data have particular responsibilities to create open metadata standards.