The traditional methodology for protecting data is to first use RAID locally and second to replicate information to a remote site where RAID is once again used for protection. The telecommunications and data movement costs are not trivial, and you end up with four copies of data. As companies consider the impact of the marginalization of RAID, there are classes of data where alternatives to replication should be considered.
Classes of applications that are not appropriate for large-scale replication methods include large distributed databases of information, file systems, distributed medical records from hundreds of hospitals, any kind of large unstructured content â€“ digital content such as image, digital, audio, or archive data. An alternative to replication is information dispersal, splitting data into multiple pieces that can be recovered from some threshold subset of those pieces. Data dispersal offers the same benefits as replication, such as data integrity and recoverability with slices rather than copies, which utilize fewer resources. Replication remains an important tool, but as companies look to manage and extract value from big data pools, alternative methods should be investigated.
Action Item: Consider eliminating replication for certain applications. Understand the impact of data growth, RAID limitations and the options for data dispersal.
Footnotes: Storage Directions in an Era of Big Data