When storage requirements approach the hyperscale level, drive failures and errors present a constant challenge. An organization storing a petabyte or more of data can expect hundreds of drives to fail every year, requiring week-long rebuild times. Knowing that a portion of its drives are certain to fail during any given interval, how can an organization protect against data loss and ensure that data is online and remains available? The traditional approach to guarding against these failures is replication. However, at the petabyte level and beyond, creating copies of the data eventually becomes impossible from a technical, financial, and organizational standpoint.
Information dispersal is an alternative approach for achieving high reliability and availability for large-scale unstructured data storage and is based on the use of erasure codes as a means to create redundancy for transferring and storing data. An erasure code transforms a message of k symbols into a longer message with n symbols, such that the original message can be recovered from a subset of the n symbols (k symbols). Simply speaking, erasure codes use advanced math to create “extra data” that allows a user to need only a subset of the data to recreate it.
An Information Dispersal Algorithm (IDA) builds on the erasure code and goes one step further. It splits the coded data into multiple segments (called slices), that can then be stored on different devices or media to attain a high degree of failure independence. For example, using erasure coding alone on files on your computer won’t do much to help if your hard drive fails, but if you use an IDA to separate pieces across machines, you can now tolerate multiple, simultaneous failures without losing the ability to reassemble that data.
Information dispersal eliminates the need for replication. Because the data is dispersed across devices, it is resilient against natural disasters or technological failures, such as drive failures, system crashes, and network failures. And because only a subset of slices is needed to reconstitute the original data, multiple simultaneous failures across a string of hosting devices, servers or networks, will still leave data availability intact.
Action Item: Storage requirements for most commercial organizations have not yet reached hyper-scale, but will within the next several years. CIOs and storage-decision makers at organizations managing large amounts of unstructured data should be alert to signs that their storage infrastructures are reaching the reliability and availability “breaking point:” reliability and availability exposure, the need to constantly add admin staff to fight fires, and rapidly increasing hardware and facilities costs.
Footnotes: For more on information dispersal, visit www.cleversafe.com.