The title references Bobby Locke's1 immortal phrase referenced frequently by golfers. The analogy applies to storage systems as Wikibon has consistently pointed out that for most workloads, write performance is the critical factor determining application performance. This is especially true for database systems of record where performance directly drives business value.
Wikibon has previously analyzed hybrid storage arrays and their performance characteristics. In general, these boxes have been very attractive for mid-sized and lower-end data centers, especially when they have wanted a single or small number of arrays, and when there has been a challenging IO workload (e.g., database, technical, VDI etc.).
When Oracle purchased Sun in 2009, it completely changed the way in which practitioners needed to view Oracle's hardware partners. No longer were EMC, NetApp, HDS, HP, and others hardware partners, rather overnight they became competitors. Oracle's intent with hardware was to transform the company to provide what it calls engineered systems - i.e. converged infrastructure specially tuned and designed for Oracle database systems.
Wikibon has been consistently advising its clients to use caution with respect to Oracle hardware purchases because Oracle was not "hardware agnostic" anymore. Specifically, Oracle wanted customers to buy its engineered systems, not competitive products and while Oracle's value proposition could be alluring it wasn't necessarily appropriate for all clients in all workloads.
The Oracle ZFS Storage Appliance is aimed primarily to support this integrated vision, and this research note is intended to help customers understand where the appliance fits and where it doesn't. The product is positioned at the high end of the data center market, in part as a solution for backing up large Oracle databases, and for high-performance write environments-- i.e. those typified by Oracle's largest customers running transactional database applications. The appliance's design is a true “flash-first” hybrid storage architecture. The features of ZFS include:
- Integrated file system and volume manager;
- Support for very high storage capacities - i.e. 2PB (but theoretical max is 256 quadrillion zettabytes (ZB), where a zettabyte is 270 bytes);
- Sophisticated protection against data corruption with autonomic repair;
- Snapshots and copy-on-write clones.
Wikibon’s analysis concludes that the ZFS appliance is indeed a true hybrid, allowing high continuous read and write rates with sustained low latency. This makes it suitable for high performance streaming workloads such as Oracle database backups. Notably, true to form, Oracle has integrated it closely with RMAN, Oracle’s backup function and other Oracle features such as Hybrid Columnar Compression.
To complete its objectives, Wikibon analyzed the architecture and performance of the ZFS Appliance in depth and compared it to "traditional dual-controller storage arrays" (e.g. EMC, NetApp, HDS, HP, etc. mainstream mid-market arrays) in high write environments. Wikibon found that, in common with all flash-first true hybrids, the latency was about or just below the 1 millisecond mark for read and write IOs, and the latency variance very low. Wikibon talked to a number of practitioner members using the product, who cited that a throughput of greater than 25 terabytes/hour could be achieved with an in-house ZFS benchmark, and a sustained 11.5TB/hour for a backup workload on the ZFS appliance (equivalent to 23,000 IOPS, see methodology section for more details). Wikibon analysis also found that the overall cost of purchase, deployment and operation was significantly lower than that of traditional storage arrays or filers for IO and write intensive workloads, as shown in Figure 1 below. For low end customers with less demanding IO requirements and somewhat more read-intensive workloads, however the ZFS product may likely be overkill. Clients are advised to assess their current and future workload requirements and consider these factors in choosing an array that best fits from a performance standpoint.
Figure 1 shows:
- Write IOs are significantly more expensive to perform that read IOs;
- Traditional arrays/filers are 2-3 times more expensive that true-hybrid flash-first storage arrays;
- The higher the Write-rate, the greater the advantage of true-hybrid flash-first storage arrays.
Wikibon believes that the Oracle ZFS appliance is an ideal strategic fit for high streaming environments such as database backups, and can be well-integrated into high-performance Oracle database environments. In these and similar workloads, the ZFS appliance provides a best-of-breed, cost-effective solution relative to many products on the market.
Introduction to ZFS Storage Appliance
The ZFS Appliance is fundamentally designed with a flash-first hybrid architecture. The read IO cache utilizes MLC SSDs, and the write cache utilizes the higher performing but higher costs SLC SSDs.
ZFS Appliance2 The ZFS Storage Appliance consists of four components:
- The Oracle ZFS File System and Solaris Operating System;
- The Oracle ZFS Appliance management software, including DTrace;
- The Oracle 2.5” storage modules for hard disk drives (HDD) and flash read and write storage (SSD);
- Write Optimized Cache consisting of sets of 4 SSDs with 288GB of SLC flash storage, with a maximum write rate of about 1.3 GBytes/second, and a sustained write rate at 70% loading of 1.3 x 0.7 = 0.91 gigabytes/second;
- One or more Storage Appliance Controllers, holding up to 1 TB of DRAM each.
All storage services are included in the appliance, including:
- Access as an NFS or iSCSI device;
- Hybrid Storage Pools with automated use of four tiers of storage;
- ZFS file system with error detection and self-healing capabilities;
- Built-in, in-line de-duplication combined with ZFS compression;
- Simultaneous multi-protocol support across multiple network interconnects, including GbE, 10 GbE, Fibre Channel, and InfiniBand;
- Hybrid Columnar Compression support for Oracle Databases.
DTrace DTrace is an integrated package allowing real-time storage analytics together with good visual presentation of the data. This provides all the data required to identify, troubleshoot, and resolve performance issues. Of the hybrid and traditional arrays reviewed by Wikibon, only Tintrí has the equivalent quality of data (at a lower configuration level, and for VMware environments only). Significantly, we believe that as much as 30% of ZFS appliances go into VMware environments which are historically very tricky to diagnose from a performance standpoint. Practitioners we spoke with indicated that Oracle's support in VMware environments has been solid, despite concerns among many Wikibon members about virtualizing Oracle apps.
Oracle Snap Management Utility for Oracle Database The Oracle Snap Management Utility for Oracle Database is a standalone management tool designed to work with the Sun ZFS Storage Appliance. It provides:
- A simple and automatic way for DBAs to back up, restore, clone, and provision Oracle Databases stored on a Sun ZFS Storage Appliance via a graphical user interface;
- One-step provisioning of database copies to create (for example) test and development environments;
- Support for any Oracle Database 10g or Oracle Database 11g deployed on the Sun ZFS Storage Appliance.
The Snap Management Utility uses the snapshot, clone, and rollback capabilities of the Sun ZFS Storage Appliance together with standard host-side processing, so that all operations are held in a consistent state.
Direct NFS Cloning Clonedb is a Direct NFS (DNFS) feature providing an alternative to using the traditional RMAN database duplication. Clonedb uses dNFS technology to create a clone using an existing backup of a database as the data store, using copy-on-write technology. Only changed blocks need to be stored in the clone, while the unchanged data is referenced directly to the backup files. This significantly increases the speed of cloning a system and means that many separate clones can be created against a single set of backup data files, thus saving space.
How Read and Write Caching Work
Caching read IOs is relatively simple, as the master copy is on already on the persistent storage. If there is any problem or failure with the caching, the data is refreshed from the persistent storage. Read DRAM caches are held in the controller. Read flash caches can be much bigger, and use lower cost MLC flash technology as only one copy is necessary.
Global Write DRAM caches are more complex and more expensive, and have to be protected at every stage. Non-volatile (NV) write cache is a part of the main DRAM memory in the controller. A write IO is written to the non-volatile write cache, and a second copy written to the DRAM cache in the other controller. The write is complete from the application's viewpoint as soon as both write acknowledgements are received. A background task scoops up all these writes from the controller caches and writes them out to disk, blocking them up if possible. When the data has been written, one of the copies in the controller cache is purged, and one copy is usually retained in main storage until paged out. Because of the small size of the DRAM cache, the controller is often designed to try and detect sequential data and bypass the cache. In systems without flash-cache, the small amount of non-volatile write DRAM cache restricts write performance, and the bottleneck is the writing of the blocks out from DRAM. This caching process can easily saturated the NV controller cache, at which stage the data has to be directly written to disk. This leads to much lower write rates and jitter in the write performance IO response times.
Flash is the fastest form persistent storage. The simplest way to speed up writes is to write them directly to flash (e.g., flash SSDs). However this is not practical in a general purpose array, as the flash storage would have to be dedicated to the specific volumes being written using a manual process. Some systems have multiple flash cache write pools (e.g., the NetApp flash pools) from a mixture of SSDs and HDDs. Multiple pools with flash cache are usually required to meet different RAID and performance requirements, and again administrative effort is required to deploy workloads to the correct pools. The most efficient hardware and operation solution is to to use global flash caches.
Global Write flash caches work like DRAM caches, except that the media is persistent. Flash write caches allow much larger caches, and to protect the data multiple copies in higher performance and longer lasting SLC flash technology is generally used. The bottleneck now is the speed that the data can be written from the flash cache (slower than DRAM) to the back-end storage media.
To investigate write performance, Wikibon analyzed hybrid storage arrays with global write caches, and compared them with traditional arrays without global write caches.
There are a number of hybrid storage arrays with global flash caches. These include:
- New hybrid storage arrays such as Tegile, Tintri, Starboard and the Oracle ZFS appliance, designed from scratch as a flash-first architecture. These products often have inline compression/de-duplication to lower flash costs.
- Dual-controller traditional arrays such as the newly announced EMC VNX and the Dell Compellent, which have been upgraded with global write flash caches.
Note: These systems, like the ZFS appliance, acknowledge a write directly to flash and persist data without having to go to spinning disk.
To analyze the impact of hybrid storage on large-scale high write environments, Wikibon constructed a performance model based on previous work on hybrid storage. The Oracle ZFS Storage Appliance was used as the reference model for hybrid storage, as it has the most aggressive implementation of flash-based write technology. The design elements that contribute to the write performance on the ZFS appliance include:
- Very large controller memory DRAM (up to 2 terabytes, 512GB assumed in the model)
- Separate and very large global flash read cache (up to 4 terabytes MLC, varied in model to meet read IOPS)
- Separate and very large global flash write cache (up to 10.5 terabytes SLC, 5.2 after dual writes, varied in model to meet write IOPS)
- Inline compression & de-duplication (reducing data written)
- ZFS file system integrated within array controller
- High degree of parallelism of IO and effective use of up to 10 core processors
The traditional storage arrays/filers that were modeled were modular dual-controller storage systems, based on systems without global flash caches, such as the the Hitachi AMS systems, NetApp FAS series and many other products in this category. So-called Tier-1 arrays such as the EMC VMAX, HDS VSP, and IBM 8000 series were not modeled, as they were well outside the solution price envelope being considered. The maximum DRAM storage per controller for this class of array/filer is about 32-48 GB. The amount of controller cache was modeled to minimize costs to meet the IOPS target. For the read IOs on the traditional storage arrays, a read flash cache from between 0 and 1,024 gigabytes was configured again to meet the IOPS target at least cost.
Particular attention in the model was paid to ensure the write rate was sustainable, and not just for short bursts. The model was then tested and calibrated by discussing the practical real-world performance experiences of the ZFS appliance and traditional storage systems with experienced practitioners. We talked in-depth with six such customers about their performance experiences using the ZFS appliance and those using traditional arrays. Figure 2 in the Findings section shows the results of equipment cost, and Table 1 in the Footnotes gives the detailed assumptions.
The general feedback from the ZFS appliance practitioners was positive:
- Praise for the performance of the ZFS, particularly in backup (high-write) environment;
- 7 gigabytes/second write rates achieved in a benchmarks;
- 11 terabytes/hour sustained over 2.5 hours for backup, compared with 1 terabytes/hour for a traditional storage device;
- ZFS snapshots and clones universally praised;
- DTrace was praised for the quality and completeness of the performance analytic tool;
- Compression was used by only a minority of the respondents, but the performance was strongly praised (up to 16x compression), especially for reads;
- None of the respondents needed to tune the ZFS read or write caching - performance maintenance was minimal;
- No problems with availability.
The only areas of concern were:
- Replication could use some improvements in the related user tools;
- For a database company, Oracle's support system could be better;
- Greater implementation of VMware APIs for VMware, particularly the NFS APIs, could be desired.
Wikibon also talked to the practitioners about the real-world cost of acquiring, deploying, and maintaining the storage environment. Wikibon used that input and other sources of data to develop an economic model to look at the potential benefits of using high-performance hybrid arrays for high-write workloads. We modeled these results against traditional arrays to determine the sweet spot of the ZFS appliance. To this end, Wikibon selected three scenarios for detailed financial examination:
- Base case: 100 terabytes with 20% writes;
- High-write case: 200 terabytes with 35% writes;
- Large-scale high-write case: 400 terabytes with 50% writes.
Figures 1, 3, 4 and 5 show the financial total cost of ownership analysis.
The major findings of the research are shown in Figures 2-5 below.
Figure 2 has IO density as the X axis, with IOs/Terabyte as the axis metric. The y axis is the capital cost of the solution including hardware and software. The four lines are the cost curves for the ZFS at 50% writes, and traditional arrays at 20%, 35% and 50%.
Figure 2 shows that the traditional arrays are most cost effective for low-performance and low scale. For a 20% write environment, the traditional arrays are lower cost or competitive up to 7,000 IOs per terabyte. However, for higher IO rates, and higher write ration, the ZFS rapidly becomes a much lower and more stable platform cost-wise. This finding is in line with the experience of the practitioners Wikibon interviewed, who found rapidly increasing costs and difficulties with growing IO rates with traditional arrays and filers. For reference purposes, an 11 TB/hour rate referenced above in the methodology section would require a rate 23,000 IOPS on the ZFS, and the 1 TB/hour rate would require 2,000 IOPS on the traditional array exactly the point on the chart where the performance curves cross.
Figure 3 shows a comparison between the 4-year total cost of ownership between the hybrid solution and the traditional array/filer solution for an environment with 100 terabytes, 1,000,000 IOPS and 20% writes. This is equivalent to 10,000 IOPS/TB, in the middle of the x-axis in Figure 2. Included in the analysis are the cost of maintenance (18%/year of capital costs), the implementation cost, the operational costs and the power, space, and cooling costs ($150/sq. ft. per year and 10c/kWh for power, with a PUE of 2.5). The additional cost of the traditional system is 194% higher than the hybrid system. The detailed business case is shown in Table 2 in the footnotes.
Figure 4 shows a comparison between the 4-year total cost of ownership between the hybrid solution and the traditional array/filer solution for an environment with 200 terabytes, 2,000,000 IOPS and 35% writes. This again is equivalent to 10,000 IOPS/TB, in the middle of the x-axis in Figure 2. Included in the analysis are the cost of maintenance (18%/year of capital costs), the implementation cost, the operational costs and the power, space and cooling costs ($150/sq. ft. per year and 10c/kWh for power, with a PUE of 2.5). The additional cost of the traditional system is 241% higher than the hybrid system. The detailed business case is shown in Table 3 in the footnotes.
Figure 5 shows a comparison between the 4-year total cost of ownership between the hybrid solution and the traditional array/filer solution for an environment with 400 terabytes, 4,000,000 IOPS and an aggressive 50% writes. This again is equivalent to 10,000 IOPS/TB, in the middle of the x-axis in Figure 2. Included in the analysis are the cost of maintenance (18%/year of capital costs), the implementation cost, the operational costs and the power, space and cooling costs ($150/sq. ft. per year and 10c/kWh for power, with a PUE of 2.5). The additional cost of the traditional system is 266% higher than the hybrid system. The detailed business case is shown in Table 4 in the footnotes.
The figures do not include the potential benefits of the in-line de-duplication (the traditional arrays only have post-IO process de-duplication as a batch job) and compression. With this feature, one respondent found a 10:1 lower cost for the ZFS solution compared with the traditional filers. All the respondents found at least a 2:1 cost difference, in-line with the cases studies presented in figure 3-5 above.
Conclusions & Recommendations
Traditional storage arrays and filers are attractive when the scale is small, and the number of IOPS is below 5,000. This in our view is not the "sweet spot" of Oracle's ZFS Appliance. The ability of flash-based caches in these traditional arrays to improve read performance is sufficient to achieve good IO response times, and automatic tiering systems work well when the IOPS and write percentages are low. There are many solid mid-range arrays from numerous companies including Dell, EMC, HDS, HP, IBM, and NetApp that are better fits than the ZFS Appliance at this low end of the market. Figure 6 shows the left quarter of Figure 2 in more detail, and shows the strategic fit of traditional storage arrays and filers.
Wikibon believes that flash-first based storage arrays (both flash-only and hybrid) are changing the cost structure of IO intensive applications, and profoundly reducing the cost of high-IO workloads in general, and high write-IOs in particular. Practitioners will utilize this capability to improve the performance and functionality of systems, which will lead to a significant increase in the percentage of high-IO workloads. Even just one application in a data center can dictate that a hybrid or flash-only array approach will be needed. The analysis above indicates that hybrid storage devices should be broadly adopted by practitioners and data center executives.
Wikibon members should consider the ZFS Appliance in more demanding workloads where sustained write performance and IO requirements are higher. Examples include high performance environments such as specific backup applications and core transaction-intensive database workloads. In these situations, because of the hybrid design of the ZFS Appliance, customers will find significant savings relative to traditional disk arrays that don't scale as well. The high-end ZFS storage array is the highest performing hybrid storage device that has been analyzed by Wikibon, and in a class of its own when it come to high write-IO environments. Wikibon believes that the best strategic fits for ZFS storage arrays are the following types of workloads:
- Oracle database backup:
- RPO will be improved by rapid completion of back-ups;
- Oracle Hybrid Columnar compressed databases are supported;
- The ZFS is a much lower cost solution, a factor of at least two cheaper.
- Large mail systems:
- Many have 50% IO writes or higher (one recently analyzed by Wikibon had a 75% write rate).
- Database-intensive and other large backup requirements.
- Technical workloads with high-IO requirements.
- Workloads with high delete rates of intermediate objects or files.
- Commercial workloads with high IO and high IO write requirements.
- VMware environments with high random IO
- DTrace is particularly effective in identifying hot-spots in high random IO environments after the VMware IO blender
Action Item: CIOs, CTOs and senior storage executives should position the Oracle ZFS appliance as an ideal strategic fit for high streaming environments such as database backups. As well, the product can be successfully integrated into high-performance Oracle database workloads. In write-intensive and heavy IO workloads, the ZFS appliance will likely prove the best-of-breed, lowest cost solution. Outside of that sweetspot, traditional midrange arrays and filers will often be a better economic fit.
Note 1: The title is derived from the golf saying attributed to Bobby Locke “You drive for show, but putt for dough”.