In a previous post, Wikibon showed that slicing data up and spreading it across a network of 32 independent nodes can increase availability five million times over a traditional replicated copy, while using the same storage resources. Yes, five million times; the calculation is ((1-99%)/(1-99.9999998%). In another post the availability is kept constant, and the cost of the secure highly available cloud system is reduced by an order of magnitude compared with traditional storage array-based topologies. At the same time, companies have extended the erasure technology to provide built-in encryption support.
A simple healthcare example can illustrate the potential for cost reduction for archival systems. Patient records of x-rays, MRI scans, CT Scans, EKGs, MEGs, 2D and 3D scans all take up a large amount of space and have to be kept for up to 100 years. Over time a patient will have multiple doctors and multiple medical facilities in multiple locations, any or all of whom may need to access those images.
The key requirement of such a solution are similar to most archiving systems, with the difference that this would be a production system:
- Immutability of the stored objects;
- Preservation of provenance;
- The ability to dynamically change to new technologies over time;
- Extreme reliability (a bit lost is the object lost);
- Very high availability (lives may depend on access);
- Cost efficiency that improves over time;
- High level of security.
The most cost-effective solution is to use a logically central, geographically dispersed data store based on encrypted erasure codes, shared over many medical centers. Each center would have a cached copy, which does not need backup, as it can be reclaimed from the cloud. The cloud store does not need further copies made for disaster recovery. A low-cost rolling backup strategy and update logs would be necessary for recovery from catastrophic software or operator failure. A single dispersed copy would be more accessible and reliable than a best-of-breed three-data-center array-based synchronization topology, and be 9-to-25 times less expensive.
One of the constraints to adoption is the object nature of the resulting storage model. For developers who are familiar with object storage and can take advantage of it, this is good news. For traditional developers and ISVs, this approach can represent a challenge.
Action Item: Erasure codes will allow highly available dispersed archival systems that are an order of magnitude less expensive than traditional systems. CTOs should be looking hard for opportunities to use these revolutionary topologies to design more cost effective systems that will be simpler to manage.
Footnotes:
- Additional information on erasure coding and very high availability can be found at Erasure Coding and Cloud Storage Eternity, Wikibon 2011;
- Additional information on different storage topologies and the cost difference for cloud storage can be found at Reducing the Cost of Secure Cloud Archive Storage by an Order of Magnitude, Wikibon 2011.