In a Peer Incite with Shutterfly last year we wrote that erasure coding techniques were not the best fit for performance-intensive applications. The Shutterfly use case we reviewed was clearly designed for archiving. At the time we suggested the following:
With erasure coding..."you need to architect different methods of managing data with plenty of compute resource. The idea is to spread resources over multiple nodes, share virtually nothing across those nodes and bet on Intel to increase performance over time. But generally, such systems are most appropriate for lower performance applications, making archiving a perfect fit."
In the context of shutterfly the comments still stand because it was a classic tier 2 tier 3 environment, clearly architected for archiving.
The bet on Intel is paying off. Specifically, there has been some discussion in the Wikibon community about the applicability of erasure coding in more performance-intensive applications. Amplidata, for example has made aggressive claims that its approach can perform well without performance compromises. Cleversafe has also indicated that it is making major strides towards supporting more performance-intensive applications.
Much of this debate is semantics. For example, consider Hadoop. Hadoop is a "high performance" batch process, but it's certainly not an online/oltp platform by its parallelized nature (i.e. non-locking). However erasure coding methods are becoming more efficient and there are some clear examples where the approach is much more well-suited for performance applications. Many folks will remember when RAID 6 was a real struggle for processors and was criticized as overhead-intensive...yet today it's pretty much table stakes.
We expect erasure coding techniques to continually move up the performance spectrum, supporting not only archiving but increasingly more business critical applications. Intel's advancements and the power of more cores sets up erasure coding to increasingly become the accepted way to protect critical data. It's just a matter of time in our view. Disk drive rebuild times are becoming so onerous that as capacity under the actuator increases, RAID level protection is becoming inadequate because the probability of a failure on rebuild will eventually approach 100%. As well, as flash systems become more prevalent, erasure coding will be the most sensible way to protect flash, not only because of rebuild times but more importantly because flash is a generally unreliable technology that requires special garbage collection techniques to avoid losing data. Flash is 'dirty' by its very nature and erasure coding is a near term solution to these problems.
Action Item: Wikibon has been following erasure coding as applied to computer storage systems since it's early commercial days last decade. We believe the approach will eventually go mainstream as databases are optimized for systems that use the technique and Intel continues to deliver higher performance cores. Erasure coding will change the economics of storage and data management and practitioners should begin to identify use cases that are sensible for this approach. Long term archiving is a good place to start due to its low risk and cost sensitivity. Learnings from these initial implementations should be applied to push erasure coding up the performance spectrum to gain near-term competitive cost advantage.
Footnotes: