As RAID becomes more and more pervasive, users sometimes struggle to fully understand what the different levels signify. Most people know what RAID 0 or RAID 5 mean, but fewer understand the principles of RAID 60 or RAID 5EE, for example.
All of these can be mapped in terms of performance, capacity and reliability; and RAID levels have been designed to provide a balance of these three key parameters.
In the most basic term, RAID is created around “simple RAID” associations RAID 0, 1, 5 and 6. Some of these have some ‘exotic’ variations, like RAID 1E, 5E and 5EE that provide similar features to the simple RAID levels from which they spin-off, but contain additional features.
The table below summarizes the principles of the 4 basic RAID levels and their trade-offs based on performance metrics, reliability requirements and amount of capacity the user is willing to dedicate in order to obtain the same.
RAID levels can also be aggregated (or “spanned’) to create secondary levels like RAID 10, 50 and 60 -- and these are often the least understood. However, these more complex RAID levels add some extremely interesting features, such as,
- R1E (mirroring with more than 2 disks involved and possible an uneven number)
- R5E (same as R5 but with a ‘hot’ spare disk capacity)
- R5EE (same as R5E but with the ‘hot’ spare included in each stripe)
What needs more thought are the complex RAID levels like RAID 10, 50 and 60 that add some extremely interesting and unknown features. These basically offer a collection of RAID volumes, of the same level and size, but with data striped across them in the RAID 0 fashion to balance performance.
So, for example, a RAID 50 of 20 drives could be built with 4 sets of RAID 5 arrays, each one with 5 disks and then a RAID 0 across them. However, the question is, ‘why should someone select something as complex as a “20 disk RAID 50” rather than just build a simple RAID 5 with the same 20 disks?’
The main reason is that, if you look at RAID 5 read performance, it is very poor in degraded mode. To recover the data from a bad block/dead disk, the 19 surviving disks must be read and 18 XOR performed across each, before the data is returned. This is 20x slower than a normal read and the more disks involved, the worse it becomes. The same applies to rebuilds. If each disk is 1TB, to rebuild it requires moving 20 TB of data and XORing 19TB (which could literally take weeks to complete).
However, with RAID 50, the missing data will be regenerated by the failed array. In the case of a 5 disk array, 4 reads and 3 XOR are needed, which is about 5x more efficient than the RAID 5 counterpart. RAID 5 uses only one parity disk, while RAID 50 uses one parity disk per span, so 4 in this case. It is, as expected, a trade-off between capacity and performance. An analogous is for rebuild; it will require moving 5TB of data and XORing 4TB, still requiring a lot of time but 4x shorter than the RAID 5 example.
Configurations can be complex; however, some basic understanding of what problems need to be solved will help enormously in defining the most appropriate RAID level for any application. And, the bottom line is that there is usually some trade-off between performance, capacity and reliability.
Action Item:
Footnotes: