It is ironic that RAID, an acronym for “Redundant Array of Independent Disks”, was formerly known as “Redundant Array of Inexpensive Disks”. The Wikibon Peer Incite held on November 30th 2010 entitled “Is RAID dead?” was one of the most illuminating of all. A persuasive case was made that RAID as it is implemented today is dead. However, traditional array specialists illustrated equally persuasively that RAID would continue to evolve for traditional transaction processing with multiple copies and more granular RAID, while visionaries showed how new techniques such as dispersal were better suited and lower cost for large distributed objects of the Big-Data era. Both advocates are right, and the irony is that RAID technology will be the low-latency but expensive option.
What is important is that the right model of data protection is applied to the many types of data. Many discussion of data protection have been over-simplistic. One recently suggested that the level of data protection is dictated by the storage method used (e.g., block is high, file is medium, and object is low). The examples of large objects in government and healthcare applications given in the Wikibon Peer Incite that must have provable data integrity showed that such over-simple classifications are insufficient and misleading.
The factors that will determine the correct techniques are the same as they ever were:
- Total Cost (including the cost of storage, data transport, recovery, business continuance, archiving and disposal),
- Latency of data access,
- Bandwidth available to transfer data,
- Recovery Time (RTO),
- Data lost in recovery (RPO),
- Level of Data Integrity required (Discussed in more detail in "Guaranteeing Data Integrity").
Different types of data will be best suited to different techniques at different times in their existence. The edges of the envelope were well illustrated in the Peer Incite:
- High-value traditional databases with very low latency requirements will be best suited for “New RAID”;
- Large objects with high data integrity requirements will be best served by lower-cost data dispersal techniques;
- Metadata needs higher levels of performance and recoverability;
- Data should be moved across networks as little as possible.
What is far less clear, however, is how different parts of the envelope should be treated. As the Peer Incite illustrated, there will be strong views on different approaches that should be taken, especially by traditional array advocates. It is equally clear that traditional array techniques will be far too expensive for the tsunami of unstructured data that is smashing into data center budgets. Storage innovation is at fever pitch, with many new architectures and techniques being developed by vendors and service providers.
It is an exciting time for storage specialists if they have the business acumen to match new models of data protection to requirement.
Action Item: CIOs and CTOs will need to ensure that the characteristics of new models of data protection are fully understood and a clear process implemented to help the business decide the correct balance among cost, capacity, performance, reliability, recoverability, and integrity. The best and brightest storage specialists with a deep understanding of the application and business requirements will be needed for the job.
Footnotes: