Imagine taking 1,000 photographs at a family wedding, and archived them to the cloud. The service takes all the files, compresses and encrypts them, and stores them in a single object called “wedding2010”. Fifteen years later you go to retrieve those pictures. In the process of writing the data out, storing the object, moving the data over 15 years, and finally retrieving it, one bit in 108 is flipped. The data has been corrupted and now every picture can never be seen again.
Yupu Zhang et al from the University of Wisconsin-Madison have done some ground-breaking research that systematically injected errors that would naturally occur into systems. They showed that even in a high-functioning file system such as Sun’s ZFS, data integrity was compromised. Many of the problems were in the memory management part of the system. They argue persuasively that systems “…should be designed with end-to-end data integrity as a goal.”
Currently, Wikibon believes that no vendor or service guarantees end-to-end data integrity. The current architectures of storage systems, memory systems, middleware and applications cannot deliver on such a guarantee. Cleversafe, who was a Wikibon 2009 CTO award winner, uses its bit-slicing technology to enable a data integrity guarantee for the storage piece of the stack. This is being used by government agencies and healthcare providers that need to store large amounts of data for long periods of time. However, these agencies still have much complex work to do on their side to fill in the integrity holes in the rest of the stack.
Consumers expect data integrity (or uncorrupted data), and government agencies (especially in Europe) are likely to expect and insist on data integrity guarantees. Wikibon believes that the IT industry should be working together to develop the necessary architectures and standards before government agencies mandate them.
Action Item: IT senior management and their business customers need to identify end-to-end data integrity as an SLA, and should be pushing Infrastructure 2.0 system, service and software providers to include this capability. They should be wary of vendors who make claims of data integrity without acknowledging they must fit within an end-to-end data integrity architecture.
Footnotes: