Storage Peer Incite: Notes from Wikibon’s January 15, 2008 Research Meeting
This week Wikibon presents IBM and XIV: Is this the blue cloud? Earlier this month IBM surprised the industry with its announcement that it was purchasing XIV, a small, inovative Israeli storage company founded by well-known systems designer Moshe Yanai. The immediate question is just what IBM wants to do with XIV's product, NEXTRA. IBM itself is initially positioning this innovative design as a Web 2.0 mass storage device, hence the reference to the "network cloud" in this week's title. But while NEXTRA has some features in common with the Google File System (GFS), its real strength is its ability to deliver Tier 1a performance using cheap, Tier 3, off-the-shelf components. Thus it looks more like a general service disk box, and this is how it is being used by XIV's initial customers in Israel. This resemblance has led some industry participants to speculate that it may be a replacement for the DS8000 line. However, it lacks mainframe support, which negates this role at least for the moment. And the problem with all this speculation is that IBM has not yet revealed its pricing for the box. Given IBM's ability to buy in huge bulk, it can drive the price of the components down to the point that it can charge as low a price as it wants and still make a profit. At a low enough price, NEXTRA could push some competitors out of existence.
To the extent that this purchase is about NEXTRA, it certainly does indicate a trend at IBM toward block storage. However, lost in the speculation is the reality that what IBM has really purchased is a top flight innovative design team. Leading expert and Wikibon community member Josh Krischer, who has spent time with the XIV team, says Yanai has recruited eight graduates from the Israeli Army's elite engineering design scholarship system. He speculates this team, which certainly will wind up in one of IBM's two Israeli design centers, may have been IBM's real motivation for the purchase. If so, they may drive IBM's entire line of storage products into the next generation. What NEXTRA may really be, therefore, is a harbinger of the future of IBM storage -- very low cost, using commodity components, and focused on the center of the marketplace rather than the extreme high end of performance. Bert Latamore
IBM rang in 2008 with the surprise acquisition of XIV, an Israeli firm founded by Moshe Yanai, the lead developer of EMC’s Symmetrix cached disk array. Immediately, the positioning wars began with some of IBM’s competitors proclaiming that NEXTRA, XIV’s disk array, signals the death of IBM’s DS8000 line, while IBM itself stressed the product’s positioning as infrastructure to support Web 2.0 applications.
NEXTRA uses an asymmetric cluster approach, with interface modules to perform front-end processing and provide connections to hosts, and data modules that store data on high capacity SATA disks at the back-end. Both modules rely on off-the-shelf Intel processors and standard components. NEXTRA scales by adding either interface or data modules to support more front-end bandwidth or back-end capacity respectively. Thus NEXTRA resembles a clustered storage subsystem using high capacity SATA devices (750GB or 1TB) based on a Linux platform where data is virtualized.
NEXTRA has many similarities with the Google File System (GFS), which is commonly seen as the reference model for Web 2.0 cloud computing, namely the spreading of data, self-healing, and dirt cheap components. However, it has substantial differences as well, including its block-based architecture and relatively centralized resource proximity (versus the highly distributed file architectures of GFS and others). This leads the Wikibon community to conclude that XIV is positioned more as a general purpose enterprise array, closer to 3PAR than GFS and will not today compete in the Web 2.0 cloud computing space.
What does this acquisition mean for IBM customers? While only time will tell, the Wikibon community believes that in the intermediate term the NEXTRA architecture is positioned at the high end relative to products from LSI/Engenio and NetApp. Longer term, with the addition of mainframe support, it could in fact replace the IBM DS8000. It appears IBM is betting on clustered storage as its future mainstream block-based storage approach, but in the near term it is not likely to alter storage strategies.
What does this architecture mean for the future of general purpose enterprise storage? Generally, NEXTRA is delivering tier 1a/tier 2 performance with tier 3 devices. The market NEXTRA is aimed at is systems in excess of 30TBs priced at approximately $150,000 and up and will likely support large, non-mainframe, general-purpose enterprise tier 2 storage and perhaps an emerging set of new block-based applications.
Action item: IBM’s commitment to invest a speculated $300+M in essentially what appears to be a top notch research and development capability represents a big bet in the enterprise clustered storage market. This has unclear implications but interesting possibilities to use commodity components to reduce costs for general-purpose storage. However, customers need to push IBM to understand where NEXTRA fits and see some meaningful uptake of the product in mainstream customer applications before committing.
IBM's claims that acquiring XIV was about "positioning the company to address emerging storage opportunities like Web 2.0 applications, digital archives and digital media," are hard to swallow at the moment. While there are similarities, such as dirt cheap components, spreading data and self-healing, Web 2.0 storage attributes include a highly distributed file system, not block-based storage in close proximity.
In fairness to IBM, Andy Monshaw, General Manager IBM System Storage stressed 'long term' and the company may have other 'aha' cards to play in this game of invigorating its storage portfolio. But it seems more likely in the immediate term that IBM has 1) acquired a strong development team and strengthened its ability to attract and retain talent in Israel, an emerging strategic global development center; and 2) taken steps to regain control of its midrange, general purpose storage which the company has ceded to LSI and Netapp.
It's the latter point that will most likely have an impact in the near term and explains why IBM is positioning NEXTRA, XIV's architecture, as far from its current midrange as possible so as not to disrupt demand. But it's highly likely that IBM will begin to deploy higher- and lower-end versions of NEXTRA and migrate existing midrange and even higher-end DS8000 customers to the new architecture.
Picture this. In the second half of 2008, IBM announces a new innovative line of clustered enterprise storage arrays that are virtualized, high performance and scale from very small to very large using dirt cheap, off-the-shelf components. The company puts on a big marketing push largely aimed at migrating (and protecting) existing customers and, as well, attacking traditional monolithic architectures as outdated. Current IBM customers get to buy into the new vision at very attractive prices and IBM may even gain some share in the process.
While this sounds like a familiar refrain from IBM (except for the innovative clustered part) this time the company may have an architecture with some staying power.
Action item: Loyal IBM customers should take a wait-and-see approach with NEXTRA and push IBM to more credibly position today's offering, or expose the XIV/NEXTRA/clustered storage roadmap to Web 2.0.
Despite the rapid development of SAN and NAS networks, there are still large amounts of direct access storage (DAS) in the data center. In fact, Wikibon user discussions indicate that there is plenty of "hidden" DAS in data centers. Storage people interviewed are always a little embarrassed about this, explaining that whatever project they are working on will help bring this under control (e.g. virtualization). One storage administrator commented, users are not "...comfortable with the idea of storage administration moving things around" underscoring a lack of confidence in IT.
While less common than ten years ago, many organizations still have unintended incentives for business lines to circumvent centralized storage management. This leads to increased information risk related to litigation, security, compliance and the like.
IBM's XIV approach legitimizes clustered storage. When it arrives with great force in one to three years time, there will be very little price differential between DAS and clustered storage. The only reasons left for not using shared storage will be artificially inflated internal chargebacks of storage or the perception of bad service from storage administration.
Action item: IT executives need to put in organizations that will actually manage all storage in the data center and ensure compliance. This will reduce risk, reduce cost and bring storage under control. To achieve this, storage administration must first develop a reputation for being able to supply storage at a competitive price, with flexible service. This will take time and needs to be in place before the tsunami of clustered storage washes over the data center.
Nextra's architecture does a nice job of making a system using tier 3 disk drives perform like one using tier 1 drives (low RPM, low cost SATA drives versus high RPM, high cost Fibre Channel drives). And, of course, tier 3 drives cost a lot less that tier 1 devices.
So can users deploy Nextra for tier 1 storage? Yes, but it will cost more to get the same data availability. The reason is the Nextra architecture. Nextra achieves tier 1 performance by 1) advanced caching algorithms, and 2) spreading data across all the drives in a data module in 1MB chunks. For availability, Nextra then mirrors these chunks to another data module. In addition, XIV/IBM claim that Nextra can rebuild a 750 GB SATA drive in an astonishing 20 minutes. So, if a drive fails, the data remains available via the second copy and 20 minutes later, the data is fully mirrored again.
The problem is what if two drives fail simultaneously (one in the primary data module, one in the secondary data module)? By the nature of Nextra’s spread-data-across-all-drives approach, the loss of one drive affects all data in that data module, regardless of whether it holds the primary chunks or the mirrored chunks. If another drive fails, but this time in a data module holding the mirrored chunks and while the first failed drive is still being rebuilt, a lot of data will be lost.
So what is the probability of two drives failing simultaneously? It is surprisingly high due to what I call the 'cluster' or 'Black Swan' effect. One example of this effect is a result of disk drive manufacturing processes and generally manifests itself as follows: if a drive manufacturer makes 100 drives and 10 of those are bad, one user will get nine of the bad ones. Multiple, simultaneous drive failures can and do happen. This is one of the reasons we are seeing the industry shift to RAID-6.
Tier-1 storage is generally regarded as high performance with high or ultra-high availability with the “ultra” usually coming from remote replication. And, Nextra does offer a synchronous replication capability to another Nextra system, but this feature was really designed to address replicating to another data center. Nonetheless, a user could put in two Nextra’s side by side and use local replication to mitigate the impact of simultaneous disk drive failures, but this would cost twice as much, and the user could not replicate to a remote site – at least not yet.
Action item: Users need to carefully consider whether Nextra meets their tier-1 storage availability requirements or consider utilizing Nextra for tier-2 and below.
As Nick Allen points out in Nextra Implementation/Availability Considerations, the intriguing part about XIV/NEXTRA is it offers tier-1 (or tier-1a) class performance with tier-3 disk drives, and presumably tier-3 pricing. Estimates indicate that IBM can use its buying power to acquire commodity components at 50% of the price at which XIV could acquire the same components. This could lead to some serious pricing actions on IBM's, part although IBM's corporate overheads have a way of moderating such behavior in the market.
So how should competitors react to this announcement? It seems that right now everyone is taking a wait-and-see approach...why panic? IBM bought Mylex for $240M to beef up its low-end and midrange products, then sold the company and increased its business with LSI. Now it's re-acquiring in the upper end of the midrange. Like a portfolio manager, IBM seems to be constantly diversifying, filling gaps, hedging bets and selling losers to manage its own financial performance. Why try to respond to such a strategy?
But by all accounts, the first two weeks of 2008 are pointing to a very weird year in storage. EMC has re-invigorated the solid state disk market, using a platform (Symmetrix) invented by a rock star-like engineer (Moshe Yanai) who has joined the company/division he once buried (IBM Storage). The company he helped save from the brink of oblivion (EMC) is now top storage dog and could take the same marketing tactic that IBM used nearly 20 years ago to create FUD around Symmetrix (i.e. NEXTRA may not be up to tier-1 availability).
The point is just as in 1988, when David Patterson, Garth Gibson, and Randy Katz set in motion a series of monumental events with "A Case for Redundant Arrays of Inexpensive Disks (RAID)," a bunch of really smart developers are creating new innovations built on many of the concepts being promoted by XIV (and several others). These systems use clustered storage, data spreading, self-healing and dirt cheap commodity components and are attempting to crack the current storage old guard with a new breed of economically attractive storage capabilities.
Action item: Storage customers have always rewarded lightning-fast performance, rock-solid reliability and dirt-cheap prices. Suppliers must figure out how to deliver these capabilities in both block- and file-based environments, applying a new mentality to storage design that breaks the highly customized, dual-controller approach used by most midrange storage products today.
Every array storage salesperson dreams of walking into Google and securing a large storage order. There are some similarities between Google's storage approach and XIV. Both use commodity servers with storage directly attached. Can it happen this time?
The dream will not be fulfilled with IBM's NEXTRA as it is today. To look at the reasons is instructive; it tells us what is missing from NEXTRA and other clustered services to allow them to compete with Google in Web 2.0.
Google File System (GFS) is embedded with the storage system. The storage services offered included many of those found in NEXTRA, including "chunking" data, self healing and using industry standard components. Two of the many GFS additional features illustrate the gap that remains:
- GFS manages the optimization of keeping data physically as close as possible to consumers by managing the placement of data in the network. This minimizes bandwidth and more importantly improves user response time and "user experience;"
- GFS always appends data (never overwrites). This simplifies "locking" on chunks, retrieving data versions and achieving consistency of data in the event of a storage failure.
IBM will need to add a file system to XIV to allow it to compete in the Web 2.0 arena. For Google to be interested, it would have to offer additional functionality to GFS that would make it worthwhile migrating from GFS. Probably the only way that will happen is a joint agreement between IBM and Google to put GFS on XIV. It is difficult to see the motivation that would drive such a deal, but stranger things have happened.
Action item: IBM will need significant enhancement to XIV to allow it to compete in the Web 2.0 space. Trying to make one storage solution fit every requirement is unlikely to succeed. Users should sit back and wait and see where IBM decides to focus development effort with XIV. Large enterprise tier 2 storage would seem the obvious starting point.