As storage systems scale to multi-petabytes, RAID is becoming a less viable long term solution due to increasing probabilities of data loss and onerous rebuild times. This was the premise put forth by Chris Gladwin of Cleversafe to the Wikibon community at the November 30th Peer Incite.
Contents |
What is the Problem with RAID?
VIDEO (4:15) – Chris Gladwin Quantifies the Issue with RAID
Specifically, bit error rates (BER) of conventional disk drives run in the 10E14 or 10E15 range. A 10TB drive will be 10^14 in capacity, approaching the crossover point where drive failures are not just statistically possible but they become probable.
In addition, drive rebuild times are becoming dramatically elongated. For example, according to Gladwin, in the early 1990’s it took under a minute to rebuild a Maxtor 40MB hard drive. Today, a 2TB drive can take 8 hours or more to rebuild. Within the next six years we will see 30TB disk drives which, if nothing changes, will take nearly a month to rebuild.
With such long rebuild times, the chance of multiple drive errors, which historically has been very slight, increases dramatically. Further complicating the issue is the increased proliferation of encryption, deduplication and compression techniques, creating the possibility that the loss of a single bit will make all the data on a drive unrecoverable.
VIDEO (2:27) IBM’s Brett Cooper Talks About the Application Performance Implication of RAID Recovery
What is Meant by RAID?
VIDEO (2:18) – What is Meant by RAID?
In 1987, David Patterson, Garth Gibson, and Randy Katz published a technical paper entitled A Case for Redundant Arrays of Independent Disks. In the paper the authors put forth the idea that arrays of smaller personal computer disk drives could replace larger devices and offer orders-of-magnitude better performance, reliability, scalability, and power consumption.
The problem with the concept was that many more smaller PC drives meant inherently less reliability, which the authors solved by adding fault tolerance. The result was higher availability in the form of redundant disks through mirroring and parity-based RAID.
Gladwin’s premise is primarily focused on parity-based RAID – either one-dimensional parity, RAID 5, or RAID 6, which can tolerate up to two simultaneous drive losses in an array. As Gladwin points out, the math of parity doesn’t accommodate three dimensions and as such the requirement has emerged to make more copies of data, which becomes an expensive solution. At some point, making multiple copies to address longer rebuild times becomes unsustainable.
Gladwin’s scenario indicates the industry will eventually hit a wall. For example, with 1 petabyte of storage and RAID 5, even with multiple copies, the odds of annual data loss are around 80%. RAID 6 extends the window and is fine for 1TB drives, but with 2TB drives in a 1,000 disk system there’s a 5% chance of annual data loss. That figure increases to 40% with 8TB drives, and eventually the change of annual data loss will approach 100%.
What are Some Solutions?
VIDEO (4:18) Josh Krischer and Chris Gladwin Talk About Potential Solutions
Several Wikibon members argued that the supplier community is well aware of the challenges and is working to address this problem. The general consensus was that RAID will not die, but its form will probably change. Techniques discussed include smaller 2.5” drives with reduced capacity (which recover faster), triple-parity RAID, and cluster RAID.
According to Gladwin, the clear long-term answer is multi-dimensional parity schemes or advanced encoding techniques that can accommodate multiple simultaneous failures (beyond two) without making extra copies. Gladwin points out that these techniques are based on Reed Solomon math also known as forward error correction or erasure coding. Other industries such as consumer electronics with DVDs are using 56-of-64 encoding techniques that can tolerate 8 simultaneous bit failures and tolerate scratches on the disk for example. The mobile digital telephony industry is using forward error correction and dealing with much higher than 10^14 bit loss, and they’re able to communicate digitally. The point is advanced techniques already exist that can be adopted in enterprise IT settings.
The issue according to many in the Wikibon community is semantics. For example the industry is changing the granularity of the fault domain from a drive to a portion of a drive or a head or even slices of data. Is that considered RAID and how will it be marketed? This is one of those cases where only time will tell, but it is likely new marketing terminology will emerge in force.
VIDEO (6:21) Rob Peglar on Reducing the Granularity of the Fault Domain
What About Dispersal?
VIDEO (2:54) What is Dispersal?
The traditional way of storing data is to save files and volumes together. If a backup is needed, a copy is taken for protection and stored in another location and often replicated over a network. The result is multiple copies being stored, which can become expensive -- incurring 300% or higher overhead in some cases. That is fine for transactional data with high value, but not appropriate for many forms of unstructured data such as media files and archives, which are large and rarely accessed.
Dispersed storage uses a derivative of Reed Solomon encoding. The data is broken up into slices (say 16) that are spread across multiple arrays in several locations. Algorithms then allow the data to be located and reassembled as required. If up to six of those slices are compromised, 100% of the data is still able to be reassembled from the parts. In addition, if data from a site is stolen, no data can be reconstructed.
Gladwin talked specifically about Cleversafe’s approach to dispersal, which includes not only erasure coding or Reed Solomon approaches to improve reliability but also coding techniques that provide added security capabilities. Cleversafe’s dispersal virtualizes the data itself and transforms that data into dispersed information as it is stored and then transforms it back as data is read. The technique can be done across servers or within an array or potentially even within a drive. For example, if a piece of data (e.g. a 1 MB object) needs to be written, instead of writing it to a drive or spreading it over multiple drives the data is transformed into small elements (slices) which are not segments of the data but rather equations that can be used to rebuild the data at a later point in time.
Reads and writes are parallelized, and the slices by themselves are useless and can endure multiple simultaneous failures. The bottom line is dispersal is capable of storing petabytes or exabytes of data with the inherent properties of reliability, data integrity, and secure data delivery.
Will Dispersal Replace RAID?
VIDEO(1:36) Will Dispersal Replace RAID?
Gladwin’s scenario, which was echoed by many in the Wikibon community, sees dispersal being deployed in very large multi-petabyte environments where information security, high availability, and/or high data integrity are fundamental requirements.
For the foreseeable future dispersal will not be focused on transactional database environments, where RAID will continue to perform well and be combined with flash, for example. In these instances, multiple copies are warranted as data sets are smaller and of higher value. Dispersal will fit well with large scale applications such as social networks, medical images, certain government applications, archiving, intelligence applications and the like where storage will reach into the tens of petabytes in capacity.
The Bottom Line
RAID is not dead. However RAID as we know it has reached an inflection point. The crossover juncture between the bit error rates of drives and the size of drives; combined with onerous rebuild times is forcing the industry to think beyond traditional parity-based RAID and introduce more granular techniques of protecting data.
In addition, new forms of storage, such as dispersal, are being introduced by startups like Cleversafe as well as established companies such as EMC and NEC. Dispersal can provide both reliability and data integrity at cloud scale and support the explosion of data growth without the need for expensive copies to offset rebuild times.
There are two main tradeoffs with dispersal, including: It’s math heavy – parity based RAID is simple, whereas dispersal requires more processing power to run the math, and dispersal cannot be universally applied due to performance constraints. Nonetheless, Moore’s Law favors the dispersed approach by guaranteeing that the computing power will be there, and as cloud computing drives data growth to new heights dispersal techniques will prove to be well positioned for many emerging applications.
Action Item: Traditional RAID architectures are not well equipped to handle multi-petabyte applications that require high degrees of data assurance and integrity. New requirements are emerging to support huge distributed data farms, and more efficient techniques must be adopted to support future apps. IT organizations need to recalibrate their notion of reliability, data integrity, and information risk exposure to determine how far RAID can take them and apply emerging techniques such as dispersal to those applications that cannot bear the tradeoffs of conventional parity-based RAID.
Footnotes: