Today there is a 10:1 to 15:1 ratio between the typical street-price acquisition cost of SAN-based arrays (enterprise storage) and the price of commodity disk drives, e.g., purchasing disk drives off the shelf at Fry’s or NewEgg. Leveraging the latest advances in erasure coding techniques, very large companies with substantial in-house development resources can integrate their own solutions out of commodity hardware and achieve a much lower ratio than historical RAID-1 or RAID-5 SAN-array approaches provide. Shutterfly, for example, has targeted a 3:1 ratio. These proven techniques also have the added benefit of validating data and dramatically reducing the risk of data loss during drive-rebuild.
CIOs who do not have the in-house resources to develop their own integrated solution or who want to delay up-front acquisition costs and reduce management expenditures may want to consider public cloud-based storage solutions. Archiving applications are a logical starting point, given the tolerance for a longer latency when accessing archives. CIOs will initially find cloud-based offerings alluring. Over time, as archives reach scale and storage systems leveraging advanced erasure coding techniques become commercially available, CIOs may find it more cost effective to bring archive data back in-house or develop a hybrid public/private cloud approach.
As data moves in and out of cloud-based storage providers, data validation will be key. Erasure coding, which provides embedded data validation, is a logical approach to this problem.
Action Item: When evaluating cloud-based data archiving solutions, CIOs should pay particular attention to the potential size of the growing archive, the data-protection approach of the cloud storage provider, and the method of validation for data that may be migrated out of the cloud. Advanced erasure coding techniques are an attractive approach that can reduce the cost of the service offering and the potential for data loss, while providing embedded data validation.
Footnotes: