Nextra's architecture does a nice job of making a system using tier 3 disk drives perform like one using tier 1 drives (low RPM, low cost SATA drives versus high RPM, high cost Fibre Channel drives). And, of course, tier 3 drives cost a lot less that tier 1 devices.
So can users deploy Nextra for tier 1 storage? Yes, but it will cost more to get the same data availability. The reason is the Nextra architecture. Nextra achieves tier 1 performance by 1) advanced caching algorithms, and 2) spreading data across all the drives in a data module in 1MB chunks. For availability, Nextra then mirrors these chunks to another data module. In addition, XIV/IBM claim that Nextra can rebuild a 750 GB SATA drive in an astonishing 20 minutes. So, if a drive fails, the data remains available via the second copy and 20 minutes later, the data is fully mirrored again.
The problem is what if two drives fail simultaneously (one in the primary data module, one in the secondary data module)? By the nature of Nextra’s spread-data-across-all-drives approach, the loss of one drive affects all data in that data module, regardless of whether it holds the primary chunks or the mirrored chunks. If another drive fails, but this time in a data module holding the mirrored chunks and while the first failed drive is still being rebuilt, a lot of data will be lost.
So what is the probability of two drives failing simultaneously? It is surprisingly high due to what I call the 'cluster' or 'Black Swan' effect. One example of this effect is a result of disk drive manufacturing processes and generally manifests itself as follows: if a drive manufacturer makes 100 drives and 10 of those are bad, one user will get nine of the bad ones. Multiple, simultaneous drive failures can and do happen. This is one of the reasons we are seeing the industry shift to RAID-6.
Tier-1 storage is generally regarded as high performance with high or ultra-high availability with the “ultra” usually coming from remote replication. And, Nextra does offer a synchronous replication capability to another Nextra system, but this feature was really designed to address replicating to another data center. Nonetheless, a user could put in two Nextra’s side by side and use local replication to mitigate the impact of simultaneous disk drive failures, but this would cost twice as much, and the user could not replicate to a remote site – at least not yet.
Action Item: Users need to carefully consider whether Nextra meets their tier-1 storage availability requirements or consider utilizing Nextra for tier-2 and below.
Footnotes: