Storage Peer Incite: Notes from Wikibon’s January 22, 2008 Research Meeting
This week's subject is EMC's recent set of DMX announcements. Hidden in a list of routine upgrades was something that will change the technology and performance of storage -- the inclusion of a solid-state disk (SSD) option, specifically the STEC NAND-based devices, in EMC's DMX-4 array. These are more expensive than traditional disk, but they provide a huge leap in performance, essentially creating a Tier 0, the near-equivalent of holding the data in memory. They also have no moving parts, and in fact SSD has mainly been used in industrial situations where the environments are too harsh for disk use, and in smart phones, PDAs, and other tiny consumer devices too small to accommodate disk drives. But the flash devices EMC is using are not the same as those used in consumer applications. Because they have been modified for enterprise applications, and have no moving parts, they may prove more dependable than spinning disks. EMC and STEC are estimating an average five-year life-span for the NAND devices based on accelerated aging tests. And we believe that the vendors' estimates may be conservative.
Today because of its price the flash option is intended mainly for customers with very high performance I/O needs. The common practice today is to spread storage for these devices across large numbers of disks, with 20% or lower utilization, to accelerate performance. The solid state option provides a solution that can support that high I/O rate with high utilization, which can make it much more cost attractive relative to disk in these intense environments.
However, the price of flash-based NAND is dropping steadily, driven in part by economies of scale realized through increasing demand in consumer applications. Flash disks are beginning to appear in high-end portable computers, and inevitably as prices fall this technology will replace disks in an increasing number of applications.
Based on the Peer Incite research meetings, we've tried to make this newsletter about your business. We summarize the community's input from the meeting and document specific advice for users (IT), organizational considerations, technology integration issues, and vendor actions. We also address the all-important 'getting rid of stuff' (GRS). Bert Latamore
EMC's announcement of the inclusion of NAND-based flash memory devices, OEMed from STEC, within the DMX-4, on a snowy day in Boston, shocked the storage world and ushered in a new era of storage tiering, adding tier 0 to an increasingly granular offering from EMC.
This new class of storage is aimed at highly active database logging and indexing applications, certain financial applications and perhaps also simple messaging systems (SMS), although it is unclear that text messaging requires a Symmetrix-class subsystem. Partly because of the impressive secrecy prior to the announcement, it had three (3) immediate and significant effects on the marketplace:
- No leaks equals a head start for EMC: This announcement has given EMC a clear lead over the competition.
- The surprise has created organic marketing momentum for EMC as competitors, pundits, and writers scrambled to decipher what was announced, what it meant, and, in the case of competitors, what to say and do about it. The result was an announcement with a market impact far beyond its small marketing budget.
- In one brilliant marketing move, intended or not, EMC has flipped the balance of market power from its former defensive position, constantly responding to competitive performance and functionality claims, to a leadership position with next-generation technology and performance.
At the same time, EMC made several other functionality announcements:
- 1TB 7200 rpm SATA-II drives for the DMX-4 only,
- A New GigE Channel Director with hardware-assisted IPv6 & IPsec encryption,
- Cascaded SRDF,
- HyperPAVs ,
- Array-based Native Compatible Flash (copy),
- Virtual Provisioning, EMC’s version of thin provisioning.
Of these, Virtual Provisioning is perhaps the most notable as 3PAR and Hitachi have enjoyed a lead of many years and months respectively over EMC. However these other enhancements are incremental improvements to the DMX platform, with the majority of function available for both DMX-4’s and older DMX-3 systems.
As it pertains to Virtual Provisioning, EMC has dramatically simplified the provisioning of both ‘fat’ and thin volumes and appears to have caught up to Hitachi in the simplicity game while setting its sights on 3PAR.
Will this announcement have ripple effects across the industry? Yes. IDC estimates the worldwide market for enterprise-class disk drives is approaching $7B in 2008. Assuming 4%-5% of data are candidates for SSD (solid state disk)-class performance, this implies the OEM market for STEC exceeds $300M. At a 3-4 X subsystem vendor markup, the market for Tier 0 class function could exceed $1B in the near term. Also, the inclusion of flash-based devices in the DMX may give a boost to standalone vendors such as Texas Memory Systems and Solid Data, who use more expensive and higher performance DRAM but will probably begin to blend in more cost-effective NAND-based flash devices.
What does this announcement mean for EMC’s major competitors? The Wikibon community believes the competition will be forced to respond, even if simply to have a tier-0 capability as a check-off item. It appears that EMC does not have an exclusive on the STEC technology, which suggests with some microcode work to address drive wear levels and some rigorous testing, competitors, specifically Hitachi, could have a response by year’s end.
What does this announcement mean to customers in the near and intermediate term? It suggests that tiering continues to become more granular, allowing users to get more out of installed infrastructure. In the case of SSD, customers can take advantage of Symmetrix functionality (e.g. SRDF and importantly thin provisioning to improve utilization of an expensive resource) on tier-0 devices. Best practices imply turning off <read ahead> will accelerate performance and exploit these devices to the fullest. Durability remains an unknown, and users should exercise caution specifically with respect to the possibility of needing to service these devices frequently. However it appears that STEC and EMC have conservatively spec’d the durability of these devices at a five year lifespan.
In the case of thin provisioning, IBM appears to be pulling up the rear, and IBM customers must sort out IBM’s plans to compete with this functionality using the DS8000 or XIV’s NEXTRA architecture. To the extent IBM’s customers require thin provisioning in the near term, they may want to consider alternatives.
The bottom line is this announcement has, overnight, made EMC substantially more competitive.
Action item: EMC’s DMX customers should identify the applications that can benefit from flash-based SSD and begin testing the capability. However, users should be aware that questions remain regarding the durability and importantly the serviceability of these devices. Despite being 30X the price of conventional storage, users should evaluate tier-0 not necessarily based on cost/MB, but cost per I/O. Non-DMX customers should wait to understand competitive responses, which should be forthcoming. In and of itself, the inclusion of SSD is not a reason to incur subsystem switching costs.
EMC’s January 14th, 2008, announcement further brings into focus the different philosophies of the major high end players. For EMC, the approach is to jam substantial functionality inside the DMX as the company continues to increase the granularity of its in-box tiered storage offerings. By adding an ultra-high performance tier 0 with the inclusion of NAND-based flash drives, EMC is targeting the 4%-5% of data that can exploit such functionality. Conceptually the company’s DMX device portfolio can be drawn like an HSM pyramid, originally conceived more than 20 years ago, now with 1TB SATA drives consuming the bottom layer of the pyramid.
For existing DMX customers, this is good news, although its attractiveness threatens to further deepen the reliance on EMC. Loyal EMC customers should continue to investigate virtualization engines such as EMC’s Invista, Hitachi’s USPVM and IBM’s SVC as a means of rationalizing applications that don’t need to reside on the highest end systems. Building this type of flexibility into the storage portfolio will reduce software and maintenance costs, keep service levels aligned with requirements and simplify migration in some cases. Indeed, while EMC’s Virtual Provisioning will dramatically accelerate storage allocation, many of the migration and data movement complexities remain and this type of strategy can help considerably.
One added concern of users for flash-based tier 0 devices is the durability of the drives and specifically the writes per cell the device can handle. One potential issue is if drives are wearing they may need to be serviced more frequently than conventional disk drives. However while native NAND flash technology has been rated for at least 100,000 writes per cell, EMC, and STEC, the OEM supplier of the devices to EMC, have worked around this problem and extended the longevity of the device by performing ‘wear leveling’ across extra capacity (~2X) built into the device manufactured for EMC. It is not unreasonable to expect these devices to outlive the useful life of an array.
On balance, many in the Wikibon community believe these devices have been specified conservatively by STEC and EMC. A vendor with EMC’s track record leads us to believe this is the case, however no one really knows for sure. The devices have only been in the market for a year, and these estimates are based on accelerated life tests by the vendors.
Action item: Users should be encouraged by EMC’s latest announcement as it gives them clear reasons to keep investing in the platform. As expected, many of the functions announced are also available to DMX-3 customers, as EMC has done its typically good job of protecting previous investments while holding a carrot out for customers to move to the DMX-4 (flash and 1TB SATA devices). Customers should understand how EMC and STEC have increased the reliability of NAND-based flash devices, identify candidate applications and begin testing the function in high performance applications.
The large number of potential approaches to introducing NAND storage in the data center are shown in the Wikibon entry “Integrating NAND technology into the data center”. This has the potential to bring confusion as technicians and vendors debate the merits of different approaches. The bottom line is that NAND is and will continue to be expensive but has the potential to improve performance for the users or maintain performance at a lower cost.
To streamline decision-making for introducing NAND devices, a total systems approach is recommended. The benefits of improved performance must be agreed with the departments paying for the applications. The cost side has to include the server, server RAM, storage subsystem, virtualization layer costs, and data management software, as well as any impact on the costs of development and operations. A cost/benefit model specific to the business applications should be developed with the ability to compare different storage hierarchy approaches and different cost assumptions.
Action item: Storage executives are going to be faced with a plethora of different approaches to storage management from different parts of the business and different vendors. A single IT department should be charged with developing a simple total system model of the costs and performance trade-offs of different components and approaches within the storage hierarchy. All technical and business cases should be run though the model. The model should be simple enough to maintain, and be updated as experience increases with the deployment of NAND technologies.
NAND technology has changed the marketplace for personal hand-held devices. Lower power consumption and improved durability have driven the eclipse of magnetic media in this sector. The same is likely to happen in the laptop market. This has created a very high-volume flash market, which is driving prices down fast. This marketplace is driven by consumer dynamics on a much wider scale than just the PC market.
So where is the best place to introduce such technology in the data center? Not all the advantages of NAND technology are so relevant. For example, saving power is nice, but there is no business case if the drives are thirty times more expensive.
EMC have introduced a NAND Solid State Disk (SSD) "tier-0" layer within its high-end DMX storage array. The good news is that the tier-0 SSD disks look like any other disk, and with very few minor tweaks can take advantage of the array storage management software, including EMC's thin provisioning software. The bad news is the price/disk of ~$30,000+/disk! A small ten disk configuration for RAID 6 + spares could cost $300K for just over one terabyte of storage! Data can only be moved dynamically to this type of storage from within the same array. Candidate volumes will have to specify that this type of storage is required. Moving data from outside the array will require a disruption to the applications.
The current alternatives to improving I/O performance are large server RAM (no I/O is the best I/O), larger storage cache (Storage controller RAM, more expensive that NAND SSD but able to improve the performance of all I/O in an array), stand-alone SSDs, or short-stroking disks. These are well tried and well understood alternatives. For some specific applications where there are consistent, random, high access rates to a small amount of data, or very high I/O write rates that swamp the array's fast-write capabilities, SSDs will be a valuable addition to the storage administrator's armory.
EMC OEMs the drives from STEC and does not have any exclusive capability. Hitachi's array architecture would allow an additional benefit if it decides to put STEC flash drives in the controller. Hitachi's approach of putting virtualization in the controller has the advantage of high performance and the ability to move volumes dynamically to that device from any array in the data center. This could mean much better utilization of the expensive SSDs.
The biggest limitation to effective use of this device is that whole volumes are allocated to it. Wouldn't it be nicer if the blocks of data that have high-activity from any volume or file could be migrated to the solid-state disks, and the rest to standard disk? This would utilize the SSDs much more efficiently and would automate the allocation process.
3PAR's block-based virtualization is the closest architecture to this ideal, with the theoretical ability to monitor each block and migrate blocks to the optimum location. IBM's newly acquired XIV technology has a similar architecture and could also benefit from this type of approach. The use of SSD disks and the block-based virtualization architecture could be used to bring tier-two+ storage up to tier-one+ performance in incremental steps.
Appliance virtualization approaches such as IBM's SVC and EMC's Invista would seem to put significant complexity and performance impediments in the way. Introducing block-based virtualization into these appliances could improve the attractiveness of this approach.
Microsoft's announcement of a slew of server virtualization features points to the potential of bringing virtualization of I/O back into the file system at the server level, allowing the placement of blocks on different performance devices to optimize the cost/performance balance for the application as a whole, and minimizing the cost of array controllers. Traditional storage array function such as fast write could be moved to the drives. It will be interesting to see if Google integrates such functionality into the Google File System, and how EMC's Hulk and Maui will incorporate these technologies.
Action item: NAND storage will continue to drop in price and will very probably be a disruptive technology to the traditional storage array market. Storage executives should develop a close understanding of the applications running in their data centers, and develop cost models of performance and capacity specific to their business. This will allow discussion to rise above vendor polemics and the allure of the architecture "du jour".
EMC’s January 2008 DMX enhancements caught competitors by surprise with the inclusion of NAND-based flash devices and has shifted the marketing momentum back to EMC’s favor.
While many in the Wikibon community expect Hitachi to respond directly in 2008 to EMC’s solid state disk (SSD) announcement, as well as 1TB SATA drives, Hitachi clearly designed its USPV for a different approach, and in-box tiering, if it comes, is a defensive move responding to EMC's strong arguments for greater in-box tiering granularity as a requirement. Hitachi's USPV virtualized controller not only has substantial function but can also extend that function to systems external to the controller. Despite the added ‘hop’ to get to the external systems, this approach is viewed as superior to customers that want to extend the life of already installed assets.
Hitachi customers have no need to panic over this announcement. They can take a wait-and-see approach and evaluate the durability of NAND-based flash devices. But Hitachi’s marketing response will be critical to re-enforcing this thinking.
IBM’s posture is a bit of an enigma, as the company has not previously indicated its plans to keep pace with enhancements such as thin provisioning, although it still may do so in some fashion, perhaps invoking the SVC as part of that strategy. As a processor supplier, IBM may choose to address the performance problem with a combination of database and memory techniques. Also, many point out that IBM has used flash technology in its BladeServer product line, which is perfectly reasonable albeit a different technology than that deployed by EMC.
Based on existing messages, IBM customers are left wondering, and the company needs to be clearer about the direction of the DS8000 and its recent acquisition of XIV. Customers should press IBM hard for details or look elsewhere if functionality such as thin provisioning and ultra-high performance flash devices meet critical requirements.
Outside of the very high end of the market, the need to formally respond to EMC’s announcement is less pressing but will be a topic of interest among customers nonetheless. Competitors should evaluate the applicability of this technology in their respective markets, as consumer trends are driving prices on a steeper downward slope than those of conventional drives.
Action item: EMC’s direct DMX competitors should demonstrate they have an understanding of flash-based storage and are working on a similar capability. They should educate customers on the challenges of durability of these devices and provide credible evidence that they have an approach to address the problem and integrate these products into their existing lines, all the while stressing that companies like STEC are OEM suppliers who typically don’t sign exclusive deals with a single customer.
Can you really achieve the savings and performance benefits promised with flash drives in a DMX?
First let’s look at the flash drive specifications and claims.
STEC Zeus-IOPS Drive Specifications:
- Random transactional performance in excess of 52,000 IOPS sustained,
- Over 200 times faster transactional performance than a 15K RPM enterprise class disk drives,
- Less than half the power consumption of 15K RPM HDD's,
- Sustained random or sequential large blocks transfers to 200 MB/s,
- Read/write transactional performance 225MB/s / 107MB/s,
- Read/write sequential performance (MB/s) 200MB/s / 100MB/s,
- 5 year warranty.
EMC Claims for STEC Flash Drives in a DMX
- 10x faster response time,
- 30x IOPS improvement,
- 98% less power per IO,
- 38% less power per drive,
- No moving parts for high reliability.
Second, let's look at what EMC has done to accommodate these flash drives:
EMC tweaks to DMX to optimize Flash Drives Since 2003 EMC has been enhancing the Symmetrix DMX in preparation for flash drives. These enhancements include:
- RAID-5: while "mirror everything" used to be the way-of-Symmetrix, you just can't justify the cost for every application any more, and it's probably overkill for enterprise flash drives.
- TimeFinder/Snaps: Space-saving snapshots. With the cost of SSD, you don't want to make any more copies of your data than absolutely needed. The recent Asynchronous Copy on First Write enhancements ensure that the Snaps have minimal impact on the response times of the primary volumes on the flash drives.
- Modular Packaging: Symmetrix DMX-3 and DMX-4 are "enterprise-modular" arrays, allowing for almost unlimited flexibility of configuration - you can have one "quadrant" supporting as many as 600 drives for maximum capacity, or you can have a quadrant optimized for performance with as few as 32 drives. This approach now lets you dedicate a quadrant to flash drives to maximize their performance (you'll still need the 32 regular disk drives in that quadrant to support DMX's PowerVault, but you can use the space on those drives for other things as well).
- Cache Partitioning: With flash drives, you don't really need a lot of cache for reads, but you do want to have a modicum of cache for pending writes. In an interesting twist, you might actually want to decrease cache to a bare minimum for read-intensive applications. Dynamic Cache Partitioning helps to ensure that your memory is used where it's needed most, even as the system dynamically reallocates based on actual workloads.
- Symmetrix Priority Controls: Similarly, you want to be sure that the flash drives receive the appropriate relative priority to everything else in the system, and internally the DMX uses the underlying mechanisms to protect "normal" disk drives from starvation caused by the hyper-responsive flash drives.
- Virtual Provisioning: This one's probably obvious, but with the cost of flash drives, you really want to buy as little of it as possible, so thin provisioning is almost imperative to maximize utilization. Over-provisioning allows for future growth with a minimum of hassle - just add another group of flash drives to the pool before expanding your databases.
- Switched Infrastructure: In addition to the inherent fault-isolation and reliability improvements afforded by the new point-to-point DMX-4 back-end, it also serves to minimize the latency overhead for the flash drives. While the overhead of an arbitrated loop is minimal and practically undetectable for a regular hard drive, even a little latency is noticeable with flash drives. And if/when future enterprise-class flash drives hit the market with a SATA interface instead of Fibre Channel; the DMX-4 is ready.
- Asynchronous Replication: while clearly justifiable on the merits of being able to replicate data a significantly longer distance than possible with synchronous replication, asynchronous replication is expected to be the preferred method of protecting data stored on flash drives, for a very simple reason: after you've paid to attain minimal response times, the last thing you're probably going to do is add another millisecond or two of transmission time to your writes.
- SRDF/S Response Time improvements: But if your application does require synchronous replication, you'll want the fastest possible response times, so the enhancements made in the latest microcode levels could well make a lot of difference for flash drives.
- Write Folding: With effective write performance that pretty much matches read latencies, there's not a lot to be gained performance wise - by caching writes to the disk. But, buffering writes can help reduce the wear and tear on the drive. The longer DMX can delay sending writes to the drive, the higher the probability that a subsequent write supersedes an earlier one. This "write folding" is a key foundation of reducing the amount of data SRDF/A has to transmit, and it will have a similar effect on reducing the amount of "writes" a flash drive has to deal with.
- Minimized Code Paths -- when the source of a read is a flash drive, any code to determine whether read-optimization algorithms should be engaged should be skipped. The code path must also be minimized so as to be able to handle flash drive response times – microseconds versus milliseconds
- Turn Off Sequential Prefetch -- knowing that the flash drive itself has already fetched the "rest of the track" into its SDRAM buffer should it be needed.
- Turn Off I/O Re-ordering -- since there's no rotational latency or seek times to optimize with flash drives.
- Rebuild all the drives at once – in the rare event of a flash drive failure(s), all the drives are rebuilt at once instead of sequentially, since there's no real performance difference or overhead for totally random vs. sequential requests.
- Ensure flash drives don’t starve the hard drives – Use DMX’s Priority Controls Feature logic to ensure that "lesser drives" aren't starved.
And, third, let's carefully read EMC's deployment advice.
EMC Deployment Advice for Flash Drives in DMX: The greatest improvements will be seen with higher cache read-miss workloads, owing to the lack of rotational and seek latency in flash drives. Flash drives are most beneficial with random read misses (RRM). If the RRM percentage is low, flash drives may show less benefit, since writes and sequential reads/writes already leverage DMX cache to achieve the lowest possible response times. For example, if the read hit percentage is high (> 95%) as compared to read misses, such as in workloads of decision support systems (DSS) or streaming media, improvements provided by flash drives will not likely be enough to be cost-effective.
Action item: Users should identify suitable applications and consider replacing 10-15 hard drives with one flash drive to save power, cooling and drastically improve performance. Users should also focus on the cost per I/O rather than the cost per gigabyte. EMC should provide configuration planning tools that help identify applications and balance the ratio of flash drives to hard drives.