Tip: Hit Ctrl +/- to increase/decrease text size)
Storage Peer Incite: Notes from Wikibon’s November 30, 2010 Research Meeting
Recorded audio from the Peer Incite:
So is RAID dead? No, not yet at least. But it is facing increasing pressure as hard drives grow in size and the sheer volume of data that businesses must manage, and in many cases preserve for years to meet compliance requirements, explodes. That pressure inevitably means that RAID must evolve into something new and possibly unrecognizable, just as the dinosaurs evolved into birds. That evolution is already starting. To see what is driving it and explore at least one of the new "children of RAID," read on. G. Berton Latamore, Editor
As storage systems scale to multi-petabytes, RAID is becoming a less viable long term solution due to increasing probabilities of data loss and onerous rebuild times. This was the premise put forth by Chris Gladwin of Cleversafe to the Wikibon community at the November 30th Peer Incite.
What is the Problem with RAID?
Specifically, bit error rates (BER) of conventional disk drives run in the 10E14 or 10E15 range. A 10TB drive will be 10^14 in capacity, approaching the crossover point where drive failures are not just statistically possible but they become probable.
In addition, drive rebuild times are becoming dramatically elongated. For example, according to Gladwin, in the early 1990’s it took under a minute to rebuild a Maxtor 40MB hard drive. Today, a 2TB drive can take 8 hours or more to rebuild. Within the next six years we will see 30TB disk drives which, if nothing changes, will take nearly a month to rebuild.
With such long rebuild times, the chance of multiple drive errors, which historically has been very slight, increases dramatically. Further complicating the issue is the increased proliferation of encryption, deduplication and compression techniques, creating the possibility that the loss of a single bit will make all the data on a drive unrecoverable.
What is Meant by RAID?
In 1987, David Patterson, Garth Gibson, and Randy Katz published a technical paper entitled A Case for Redundant Arrays of Independent Disks. In the paper the authors put forth the idea that arrays of smaller personal computer disk drives could replace larger devices and offer orders-of-magnitude better performance, reliability, scalability, and power consumption.
The problem with the concept was that many more smaller PC drives meant inherently less reliability, which the authors solved by adding fault tolerance. The result was higher availability in the form of redundant disks through mirroring and parity-based RAID.
Gladwin’s premise is primarily focused on parity-based RAID – either one-dimensional parity, RAID 5, or RAID 6, which can tolerate up to two simultaneous drive losses in an array. As Gladwin points out, the math of parity doesn’t accommodate three dimensions and as such the requirement has emerged to make more copies of data, which becomes an expensive solution. At some point, making multiple copies to address longer rebuild times becomes unsustainable.
Gladwin’s scenario indicates the industry will eventually hit a wall. For example, with 1 petabyte of storage and RAID 5, even with multiple copies, the odds of annual data loss are around 80%. RAID 6 extends the window and is fine for 1TB drives, but with 2TB drives in a 1,000 disk system there’s a 5% chance of annual data loss. That figure increases to 40% with 8TB drives, and eventually the change of annual data loss will approach 100%.
What are Some Solutions?
Several Wikibon members argued that the supplier community is well aware of the challenges and is working to address this problem. The general consensus was that RAID will not die, but its form will probably change. Techniques discussed include smaller 2.5” drives with reduced capacity (which recover faster), triple-parity RAID, and cluster RAID.
According to Gladwin, the clear long-term answer is multi-dimensional parity schemes or advanced encoding techniques that can accommodate multiple simultaneous failures (beyond two) without making extra copies. Gladwin points out that these techniques are based on Reed Solomon math also known as forward error correction or erasure coding. Other industries such as consumer electronics with DVDs are using 56-of-64 encoding techniques that can tolerate 8 simultaneous bit failures and tolerate scratches on the disk for example. The mobile digital telephony industry is using forward error correction and dealing with much higher than 10^14 bit loss, and they’re able to communicate digitally. The point is advanced techniques already exist that can be adopted in enterprise IT settings.
The issue according to many in the Wikibon community is semantics. For example the industry is changing the granularity of the fault domain from a drive to a portion of a drive or a head or even slices of data. Is that considered RAID and how will it be marketed? This is one of those cases where only time will tell, but it is likely new marketing terminology will emerge in force.
What About Dispersal?
The traditional way of storing data is to save files and volumes together. If a backup is needed, a copy is taken for protection and stored in another location and often replicated over a network. The result is multiple copies being stored, which can become expensive -- incurring 300% or higher overhead in some cases. That is fine for transactional data with high value, but not appropriate for many forms of unstructured data such as media files and archives, which are large and rarely accessed.
Dispersed storage uses a derivative of Reed Solomon encoding. The data is broken up into slices (say 16) that are spread across multiple arrays in several locations. Algorithms then allow the data to be located and reassembled as required. If up to six of those slices are compromised, 100% of the data is still able to be reassembled from the parts. In addition, if data from a site is stolen, no data can be reconstructed.
Gladwin talked specifically about Cleversafe’s approach to dispersal, which includes not only erasure coding or Reed Solomon approaches to improve reliability but also coding techniques that provide added security capabilities. Cleversafe’s dispersal virtualizes the data itself and transforms that data into dispersed information as it is stored and then transforms it back as data is read. The technique can be done across servers or within an array or potentially even within a drive. For example, if a piece of data (e.g. a 1 MB object) needs to be written, instead of writing it to a drive or spreading it over multiple drives the data is transformed into small elements (slices) which are not segments of the data but rather equations that can be used to rebuild the data at a later point in time.
Reads and writes are parallelized, and the slices by themselves are useless and can endure multiple simultaneous failures. The bottom line is dispersal is capable of storing petabytes or exabytes of data with the inherent properties of reliability, data integrity, and secure data delivery.
Will Dispersal Replace RAID?
Gladwin’s scenario, which was echoed by many in the Wikibon community, sees dispersal being deployed in very large multi-petabyte environments where information security, high availability, and/or high data integrity are fundamental requirements.
For the foreseeable future dispersal will not be focused on transactional database environments, where RAID will continue to perform well and be combined with flash, for example. In these instances, multiple copies are warranted as data sets are smaller and of higher value. Dispersal will fit well with large scale applications such as social networks, medical images, certain government applications, archiving, intelligence applications and the like where storage will reach into the tens of petabytes in capacity.
The Bottom Line
RAID is not dead. However RAID as we know it has reached an inflection point. The crossover juncture between the bit error rates of drives and the size of drives; combined with onerous rebuild times is forcing the industry to think beyond traditional parity-based RAID and introduce more granular techniques of protecting data.
In addition, new forms of storage, such as dispersal, are being introduced by startups like Cleversafe as well as established companies such as EMC and NEC. Dispersal can provide both reliability and data integrity at cloud scale and support the explosion of data growth without the need for expensive copies to offset rebuild times.
There are two main tradeoffs with dispersal, including: It’s math heavy – parity based RAID is simple, whereas dispersal requires more processing power to run the math, and dispersal cannot be universally applied due to performance constraints. Nonetheless, Moore’s Law favors the dispersed approach by guaranteeing that the computing power will be there, and as cloud computing drives data growth to new heights dispersal techniques will prove to be well positioned for many emerging applications.
Action item: Traditional RAID architectures are not well equipped to handle multi-petabyte applications that require high degrees of data assurance and integrity. New requirements are emerging to support huge distributed data farms, and more efficient techniques must be adopted to support future apps. IT organizations need to recalibrate their notion of reliability, data integrity, and information risk exposure to determine how far RAID can take them and apply emerging techniques such as dispersal to those applications that cannot bear the tradeoffs of conventional parity-based RAID.
It is ironic that RAID, an acronym for “Redundant Array of Independent Disks”, was formerly known as “Redundant Array of Inexpensive Disks”. The Wikibon Peer Incite held on November 30th 2010 entitled “Is RAID dead?” was one of the most illuminating of all. A persuasive case was made that RAID as it is implemented today is dead. However, traditional array specialists illustrated equally persuasively that RAID would continue to evolve for traditional transaction processing with multiple copies and more granular RAID, while visionaries showed how new techniques such as dispersal were better suited and lower cost for large distributed objects of the Big-Data era. Both advocates are right, and the irony is that RAID technology will be the low-latency but expensive option.
What is important is that the right model of data protection is applied to the many types of data. Many discussion of data protection have been over-simplistic. One recently suggested that the level of data protection is dictated by the storage method used (e.g., block is high, file is medium, and object is low). The examples of large objects in government and healthcare applications given in the Wikibon Peer Incite that must have provable data integrity showed that such over-simple classifications are insufficient and misleading.
The factors that will determine the correct techniques are the same as they ever were:
- Total Cost (including the cost of storage, data transport, recovery, business continuance, archiving and disposal),
- Latency of data access,
- Bandwidth available to transfer data,
- Recovery Time (RTO),
- Data lost in recovery (RPO),
- Level of Data Integrity required (Discussed in more detail in "Guaranteeing Data Integrity").
Different types of data will be best suited to different techniques at different times in their existence. The edges of the envelope were well illustrated in the Peer Incite:
- High-value traditional databases with very low latency requirements will be best suited for “New RAID”;
- Large objects with high data integrity requirements will be best served by lower-cost data dispersal techniques;
- Metadata needs higher levels of performance and recoverability;
- Data should be moved across networks as little as possible.
What is far less clear, however, is how different parts of the envelope should be treated. As the Peer Incite illustrated, there will be strong views on different approaches that should be taken, especially by traditional array advocates. It is equally clear that traditional array techniques will be far too expensive for the tsunami of unstructured data that is smashing into data center budgets. Storage innovation is at fever pitch, with many new architectures and techniques being developed by vendors and service providers.
It is an exciting time for storage specialists if they have the business acumen to match new models of data protection to requirement.
Action item: CIOs and CTOs will need to ensure that the characteristics of new models of data protection are fully understood and a clear process implemented to help the business decide the correct balance among cost, capacity, performance, reliability, recoverability, and integrity. The best and brightest storage specialists with a deep understanding of the application and business requirements will be needed for the job.
The three cardinal sins in storage are: 1) Causing data corruption; 2) Causing data loss; 3) Causing lack of data availability. There are many vendors that focus on #3 but not many who focus on #1 and #2. And RAID as a data protection mechanism doesn’t inherently address #1.
Specifically, few storage vendors offer data integrity guarantees at the SLA level. Some (e.g. Hitachi, Xiotech and others) offer reliability/availability guarantees. The storage industry needs a new mindset that goes beyond the notion of drive reliability into data integrity.
Instead of relying on the disk to provide data integrity, increasingly the industry needs to implement software to provide data integrity and recovery. ZFS, for example, is a step in the right direction - it is smart enough to protect against silent error corruptions by using data integrity checks. Some – e.g. Cleversafe - are providing an SLA that says not only is the data available but the bits are right.
The traditional way of storing data is to store files and volumes together. If a backup is needed, a copy is taken and stored in another location or the data is replicated over a network. Many copies of data result. That is fine for transactional data, but a large overhead (~300%) for media files, archives, and large unstructured data sets (e.g., e-mail), which are large and rarely accessed.
In 2009, Cleversafe introduced the concept of dispersed storage using a derivative of Reed Solomon encoding providing M of N fault tolerance – the successor to RAID for petabyte-sized repositories of unstructured content. The data is broken up into slices (say 16) that are spread across multiple arrays in multiple locations. The Cleversafe algorithms then allow the data to be located and reassembled as required. If up to six of those sites are down or destroyed, 100% of the data can still be reassembled without loss. In addition, if data from a site is stolen, no data can be reconstructed.
To guarantee data integrity, Cleversafe’s storage nodes compute and store integrity check values for each slice they keep. The integrity values are proactively checked for correctness by a background process, meaning the system isn’t waiting for a read to discover an error. This is crucial for long-term retention and preservation of data.
Additionally, the slice server will check the integrity of any requested slice prior to returning it to the client. If found to be invalid, the server will respond as if it does not have the slice, therefore preventing the corruption from propagating to a higher level. As a last line of defense, a data-source-level integrity check value is computed and compared by the client after it has reassembled a data source. The outcome - bad data will never reach the application or end-user.
As a result, Cleversafe is able to offer data integrity SLA’s to its customers, since it can always verify the integrity of data each time it is read. Our customers look to address all three of the cardinal sins of storage by leveraging Reed Solomon M of N fault tolerance to replace RAID, and leveraging integrity checks to address data corruption.
Action item As cloud computing and big data applications become more prevalent, the storage industry needs to move beyond the mindset of providing high availability into the realm of data integrity.
Imagine taking 1,000 photographs at a family wedding, and archived them to the cloud. The service takes all the files, compresses and encrypts them, and stores them in a single object called “wedding2010”. Fifteen years later you go to retrieve those pictures. In the process of writing the data out, storing the object, moving the data over 15 years, and finally retrieving it, one bit in 108 is flipped. The data has been corrupted and now every picture can never be seen again.
Yupu Zhang et al from the University of Wisconsin-Madison have done some ground-breaking research that systematically injected errors that would naturally occur into systems. They showed that even in a high-functioning file system such as Sun’s ZFS, data integrity was compromised. Many of the problems were in the memory management part of the system. They argue persuasively that systems “…should be designed with end-to-end data integrity as a goal.”
Currently, Wikibon believes that no vendor or service guarantees end-to-end data integrity. The current architectures of storage systems, memory systems, middleware and applications cannot deliver on such a guarantee. Cleversafe, who was a Wikibon 2009 CTO award winner, uses its bit-slicing technology to enable a data integrity guarantee for the storage piece of the stack. This is being used by government agencies and healthcare providers that need to store large amounts of data for long periods of time. However, these agencies still have much complex work to do on their side to fill in the integrity holes in the rest of the stack.
Consumers expect data integrity (or uncorrupted data), and government agencies (especially in Europe) are likely to expect and insist on data integrity guarantees. Wikibon believes that the IT industry should be working together to develop the necessary architectures and standards before government agencies mandate them.
Action item: IT senior management and their business customers need to identify end-to-end data integrity as an SLA, and should be pushing Infrastructure 2.0 system, service and software providers to include this capability. They should be wary of vendors who make claims of data integrity without acknowledging they must fit within an end-to-end data integrity architecture.
The following realities are facing forward-looking storage administrators:
- Bit error rates for disk drives will soon reach parity with drive capacities, meaning that every disk drive will lose some data.
- The negative impact of a bit error is magnified by the use of encryption, compression, and de-duplication techniques, all of which are on the increase.
- RAID-rebuild times are lengthening to the point where the probability of a second drive failure during the RAID-rebuild process is becoming unacceptably high, and the impact on application service levels is becoming too great.
- Capacity growth and budget limitations will force companies to leverage more cost-effective, high-capacity drives for most data, but much of this data can not be protected from a site loss using traditional methods because the high cost of band-width and the limited size of the data pipes prevent companies from replicating it to multiple sites in a timely fashion.
- RAID will not sustain the protection of data in the areas of greatest data growth: unstructured data and massive, structured-data repositories.
A variety of approaches will be applied over the coming years to enable companies to continue to use the current approach of RAID, synchronous metro-area replication, and remote asynchronous replication for data protection in latency-sensitive transaction databases. These may include the use of double-parity RAID protection, leveraging smaller form-factor and lower-capacity drives, increasing the ratio of controllers to drives, and de-duplicating or compressing data before transmitting to the remote site. For most organizations, this is not an area of significant concern. The real challenge for organizations is how to cost-effectively protect unstructured data and massive structured-data repositories from the near-certainty of bit errors on drives and from the less-probable but catastrophic impact of a data-center loss.
Many organizations have gone through massive data-center consolidation initiatives and reduced the total number of data centers into a few, regional super centers. When discussing data centers to support an organization’s transaction systems, a case can be made for consolidating down to as few as two data centers. For the applications experiencing the greatest data growth, however, data-dispersal methods leveraging the organization’s existing, dispersed infrastructure together with cloud-based offerings may be used to an advantage.
Action item: Before consolidating unstructured data and structured-data repositories into fewer, larger super-centers, organizations should establish a group to evaluate new data-dispersal approaches to ensure data reliability and integrity. This group should be prepared to evaluate the data-protection approaches which can be applied not only to their own organization and data centers but also to cloud-storage providers. Ultimately, the data protection approach needs to match the service level agreement (SLA) for the application, and the ability of a storage system or service to meet an SLA should be evaluated both when all components of the system are working and also during points of inevitable component failure.
The traditional methodology for protecting data is to first use RAID locally and second to replicate information to a remote site where RAID is once again used for protection. The telecommunications and data movement costs are not trivial, and you end up with four copies of data. As companies consider the impact of the marginalization of RAID, there are classes of data where alternatives to replication should be considered.
Classes of applications that are not appropriate for large-scale replication methods include large distributed databases of information, file systems, distributed medical records from hundreds of hospitals, any kind of large unstructured content – digital content such as image, digital, audio, or archive data. An alternative to replication is information dispersal, splitting data into multiple pieces that can be recovered from some threshold subset of those pieces. Data dispersal offers the same benefits as replication, such as data integrity and recoverability with slices rather than copies, which utilize fewer resources. Replication remains an important tool, but as companies look to manage and extract value from big data pools, alternative methods should be investigated.
Action item: Consider eliminating replication for certain applications. Understand the impact of data growth, RAID limitations and the options for data dispersal.
Footnotes: Storage Directions in an Era of Big Data