Hybrid Storage Poised To Disrupt Traditional Disk Arrays

Become a Member!

Why Register?

Login

Featured Research

Announcements

Technology Events

Home Profile Peers Wiki Activity Groups Feedback

Hybrid Storage Poised to Disrupt Traditional Disk Arrays

Currently 5/5 Stars.
1
2
3
4
5

rate this

Last Update: Aug 04, 2013 | 12:03

Viewed 39154 times | Community Rating: 5

Originating Author: David Floyer

1 Executive Summary
2 VM-Aware Hybrid Definition
3 VM-aware Hybrid Architecture Detail
4 Storage Cost as a Function of IOPS
5 Response Time Analysis

Executive Summary

Wikibon found strongly positive responses in the 2011 and 2012 VMware user surveys from practitioners who have installed hybrid storage arrays for virtualized environments.

The Wikibon community is interested in comparing the hybrid approach of flash as a front-end (flash-first) to the traditional array approach with flash storage as a cache, particularly in virtualized environments. Wikibon has recently talked in depth to storage practitioners and storage executives, as well to vendors and other consultants, about the different architectural approaches. As a result of this research, Wikibon is suggesting a definition of hybrid storage, and a model that estimates the cost benefits of the approach.

Wikibon has analyzed the cost and performance of flash-first hybrid arrays compared with the traditional architectures. In high performance environments the hybrid approach is clearly superior, both in cost and performance. Figure 1 shows a high-level view of the data, and the strategic fit for Hybrid systems. For IO rates greater than 700 IOs per Terabyte, the hybrid approach is superior.

Figure 1 - Comparative Cost for Traditional & Hybrid Storage Arrays as IOPS Increase per Terabyte of Usable Storage
Source: Wikibon, February 2013

Figure 2 below and Tables 1, 2 & 3 in the footnotes show the results of the analysis in more detail and from a 10 usable terabyte array perspective and includes response time comparisons. Figure 2 below shows the detailed results of the cost analysis. For example, an environment requiring 15,000 IOPS from 1 terabytes of usable storage would require:

64 drives and 1 TB of flash in a traditional storage array with a flash cache;
16 drives including 2.4TB of flash in a hybrid storage array;

The traditional storage array would cost more than twice as much as the hybrid ($190,000 vs. $88,000):

The response time of the traditional array would be three times higher than the hybrid storage array (1.7ms vs. 5.1ms);
For databases such a Microsoft SQL Server or Oracle Standard Edition, this creates a potential to reduce the number of cores required to provide the necessary performance and to redeploy database licenses that would cost between $15,000 and $20,000 per processor core.
In lower performance environments (without latency sensitive workloads such as databases or VDI), traditional storage arrays are likely to be the lower cost option. It is less likely that the improved performance of the hybrid array could be justified.

For very large environments, flash-only arrays may be attractive, with a migration policy to capacity disk as the IO demands go down. Over time, the lower relative cost of flash will continue, and flash-first arrays (either hybrid or flash-only) are likely to become the standard high-performance arrays.

Senior IT executives of SMBs and divisions of large companies, will find a "true" hybrid storage technology of great value in helping to contain costs and improve performance. One area of focus for this low-latency technology is to help optimize infrastructure software expenditure, especially in a virtualized environment. Low latency VM-aware storage will minimize the effort required to manage systems, allow more virtual machines to run, and reduce the number of cores required to run database environments. Low latency applied with purpose will decrease software budgets and improve end-user productivity.

VM-Aware Hybrid Definition

Wikibon suggests that hybrid storage arrays have three main characteristics:

The IO queues for each virtual machine are fully reflected and managed in both the hypervisor and the storage array, with a single point of control for any change of priority.
All data is initially written to flash (flash-first).
Virtual machine storage objects are mapped directly to objects held in the storage array.

All these characteristics are very important to high-performance virtual environments and lead to a significantly lower cost of storage and improved performance. This is not as important for low-performance virtual environments, where traditional lower function storage arrays or JBODs may provide lower cost solutions. This is explored in depth in the section “Storage cost as a Function of IOPS” below.

Vendors have made many marketing claims that traditional storage arrays with some flash applied can be called hybrid storage arrays. The addition of flash will always improve performance to some extent. However, Wikibon does not believe that this implementation will provide the same cost and performance advantages of a full implementation of a flash-first VM-Aware hybrid architecture.

VM-aware Hybrid Architecture Detail

This section expands on the definition of VM-Aware Hybrid Storage above, and explores the differences in architecture in detail. This section can be skipped or read later for readers more interested in the business implications.

The initial writing of all data is to the flash layer:
- In most implementations, two copies of data will initially be made at the flash layer spread out across the flash.
- The data will be trickled down asynchronously to the disk drive layer over time, where it is duplicated across multiple drives. The second copy in flash can then be eliminated.
- The flash layer is architected to spread the "wear-out" evenly across all the flash resources.
- The data is held in a non-volatile cache (e.g., capacitance protected DRAM) and is de-duplicated and compressed before being written to flash.
- The data is written and organized to fit the write performance characteristics of NAND flash (e.g., as a virtualized log-structured file).
- This approach allows large amounts of flash storage to be implemented as the front-end storage. The flash storage is better managed as a single logical unit than separate storage pools.
- In contrast, in traditional arrays, the master copy of all data is held on disk. The cache holds read-only data, and writes are transmitted to disk. The NVM memory in the storage controller cache is used to speed up random disk writes. Sequential writes are written to disk, bypassing the cache. The requirement to push data down to disk early results in longer IO response times and increases the number of disk drives required to meet the performance requirements.

VM-aware IO Queue Management favors flash over disk:
- When applications run in a virtualized environment, the IO is managed from the hypervisor. As a result, IO from many different virtual machines running the applications are “blended”, and appears to the storage as random reads and writes.
- This is an inefficient way to use traditional disks, which are more efficient with large blocks of sequential data.
- Flash storage is well suited to random reads and writes.
- VMware provides a separate IO queue for every virtual machine, and VMware controls allow for prioritization between the IO queues.
- In hybrid architectures, this information is passed to the storage array (the data is not transmitted to traditional storage arrays).
- In hybrid storage environments, the IO queues for each virtual machine are managed independently. This makes complete performance analysis available to the administrator, breaking out IO, CPU, and network performance for each virtual machine.
- Changes to priorities are made in one place (the hypervisor manager), and will ripple through automatically to the storage subsystem.

Virtual machine storage objects are mapped directly to objects held in the storage array
- In a virtualized environment, each virtual machine defines the storage objects (e.g., VMDKs) associated with the virtual machine.
- In the hybrid strorage array, the VM storage object is directly mapped into a storage object on the storage array. For example, it may be mapped to a file within NFS attached storage.
- This allows the performance of each VMDK to be be measured accurately and aggregated to provide the performance information for each VM machine.
- This approach also supports enhanced management of snapshots, backups, vStorage motion, and other functions.

Storage Cost as a Function of IOPS

Wikibon investigated the impact of the three components of hybrid storage in depth. Figure 2 below shows the results of the research. Wikibon found an overhead for implanting a true hybrid architecture at the low end of the array market with low IO per second (IOPS) requirements. However, as the IOPS increased, the hybrid solutions become much more cost effective that traditional alternatives. The break-even is 7,000 IOPS on 10 usable terabytes, based on the assumptions in Table 3 and the calculations in Tables 1 and 2. The cost of the hybrid array stays constant, and above 7,000 IOPS for 10 usable terabytes the cost of traditional storage arrays increases dramatically.

Figure 2 - Comparative Cost & Response Time for Traditional & Hybrid Storage Arrays as IOPS Increase for 10TB Usable Storage
Source: Wikibon, February 2013

Response Time Analysis

The most visible advantage of the hybrid approach is the improvement in IO response time average and variance. For true hybrid storage, the response time varied between 1ms at 1,000 IOPS to 2ms at 20,000 IOPS. For traditional storage without a cache of flash storage, the response time was about 9ms, and with a flash cache it varied between about 4ms to 6ms. IO response time becomes very important in workloads with databases, and workloads that have shifting peaks of high IO activity (e.g., VDI, Citrix, or VMware implementations that almost always have IO storms at various times of the day).

Wikibon has shown that latency-sensitive workloads such as databases can be implemented more efficiently if the IO latency is reduced to sub 2 ms, and the variance in IO latency is reduced as much as possible (probability of IO latency >8ms is <1%). When databases such a Microsoft SQL Server or Oracle Standard Edition are in use, reducing the IO latency and increasing server DRAM creates a potential to reduce the number of cores required to provide the necessary performance. A planned virtual database system with eight cores, for instance, could be reduced to five cores by optimizing the server and storage technology. The cost of a SQL Server or Oracle database license is between $15,000 and $20,000 per processor core, for a total cost avoidance of $45,000 - $60,000. The extra three cores could be used either to provide a three-core fail-over system using vMotion or to implement another database system.

Low latency storage also significantly reduces the effort of DBAs, system administrators, and DBAs to manage the system. In our conversations, hybrid users indicated that fewer than five hours a week were spent in administration, compared to two-to-four times as much for traditional arrays with flash as a cache. Low latency, ease of administration and the availability of a complete performance profile by VM were cited as the major factors in eliminating work.

Action Item: CIOs and storage executives and CTOs of SMBs or divisions of large companies should focus on optimizing infrastructure to minimize infrastructure (including database) software expenditure. Low latency storage is the most important component in a virtualized environment. Low latency VM-aware storage will minimize the effort required to manage systems, allow more virtual machines to run, and reduce the number of cores required to run database environments.

CIOs should ensure that the organization does not impede applying low latency to reduce infrastructure software budgets and improve end-user productivity. For all storage environments where the number of IOs is near or greater that 500 IOs per terabyte, low-latency solutions such as hybrid storage systems should be included in the RFP.

Footnotes: The following tables, assumptions and calculation notes are the basis for Figure 1 and Figure 2, and the examples in the posting.

Table 1 - Table 1 - Total Cost & Response Time for Traditional Storage Arrays as IOPS Increase
Source: Wikibon February 2013

Table 1 - Table 2 - Total Cost & Response Time for Hybrid Storage Arrays as IOPS Increase
Source: Wikibon, February 2013

Table 3 - Assumptions for Total Cost & Response Time for Traditional and Hybrid Storage arrays
Source: Wikibon, February 2013

Notes on calculations:

Net Controller Cache Hit Rate is calculated from an assumption of 70% cache hit rate without VM, modified by a 15% impact from Vmware. Net result is 70% × (1-15%) = 60%;
Flash-cache hit rate uses a maximum of 60% for each size of Flash-cache (Source: NetApp Flash-cache best practice), modified by -2% for each additional 1,000 IOPS;
Overall hit rate calculated from Net Controller Hit rate + (Flash Cache hit rate × (1 - Net Controller Hit rate));
Disk IOPS required is IOPS required × (1 - % overall hit rate);
Total Drives required is minimum number of 1,2 or 3TB drives required to meet capacity and performance requirements;
Disk IO utilization is calculated from IOPS required ÷ (total drives × average IOPS per Disk (= 80));
Total Cost is calculated as the sum of the Disk components (# Drives × $2,000) and cost of flash (# gigabytes × cost/GB SLC or MCL);
Response Time(RT) is calculated as (% of flash IOs × flash RT) + (% of Disk IOs × (Disk RT × (1 + (Disk IO Utilization ÷ (1 - Disk IO Utilization))))

Comments on 'Hybrid Storage Poised to Disrupt Traditional Disk Arrays'

This is a good article Dave and I concur with alot of the calculations. A few of areas that could be explored more are :

1) The benefit of a Unified Hybrid Approach. The more workloads you consolidate the more benefit a hybrid environment delivers

2) The role of the caching architecture. These vary alot in both hardware and caching software in hybrid storage environments and can make a tremendous difference to the percentage of SSD you need for a set amount of data and the performance you get

3) The innovation in Storage pooling in some of these hybrid architectures. They are breaking down the old RAID controller constructs and delivering a much more flexible and manageable storage pool. I discuss this in my blog entitled "The death of RAID" http://blog.starboardstorage.com/blog/bid/148658/The-Death-of-RAID

Posted By:lee| Wed Feb 06, 2013 12:41
Hi Lee Thanks for your comments:

I agree In general with your proposition that the more consolidation the better, assuming that the workloads are a good fit for the functionality/cost array. Sometimes file system need the same features, other times it is overkill. For small SMB organizations it is often a benefit to support file & block on the same box. All the Hybrid vendors (tegile, Tintri, Starboard) except Nimble support block and file.

I do not agree that caching is the best long-term approach for hybrid architecture. It was OK when flash was much more expensive, and a read-only cache with a small amount flash gives some relief. However, the disadvantage of this approach is that the variance of the IO times goes to hell in a hand basket. A "flash-first" architecture is superior when dealing with higher percentages of flash.

RAID is certainly deal, but in my view erasure coding is the solution, not storage pooling.

Posted By:David Floyer| Thu Feb 07, 2013 03:00
Dave, I think you will find Tintri is NFS for VMware only. They are not SAN and NAS and they do not even support all VM architectures. They so not support Fibre Channel either.
On the Hybrid architecture and caching, the whole purpose of caching is to reduce the amount of SSD required. A typical Starboard Storage sale has 100 times the HDD capacity that it has in SSD read cache capacity. You would never have more than 5% of your capacity as SSD. If you are looking for 200K plus IOPS for an app that requires just a few TB of storage then go buy all flash. However if you are looking to lower the cost of your capacity 3-5X and increase the performance by 2-5X over traditional storage systems and have 10's to 100's of TB the Hybrid platforms using caching will capture the bulk of workloads. It is all about how good your caching algorithms are and that it the measure of a Hybrid Storage system. Not just the fact that it supports multiple protocols. We designed a system that went beyond the simple recency and frequency metrics found in ZFS based ZIL and L2ARC caches. Also we are a block level Array that can handle file. None of the others you mention here woul handle file are. They are file based and as such not what a database admin would typically look to.

Posted By:lee| Thu Feb 07, 2013 04:40
Dave article prompted me to put down my thoughts on why Unified Hybrid Storage will win in a blog

http://blog.starboardstorage.com/blog/bid/267010/why-unified-hybrid-storage-will-win-the-storage-war-in-real-world-customer-environments?source=Blog_Email_[Why%20Unified%20Hybrid%20S]

Posted By:lee| Fri Feb 08, 2013 11:48
Great article, Dave.

FYI, in Figure 1, I think you may have accidentally switched the traditional and hybrid cost curves.

Posted By:Ed Lee| Wed Feb 13, 2013 04:57
Thanks Ed - I corrected the chart. I also corrected the overall hit rate calculation description in the footnotes.

David

Posted By:David Floyer| Thu Feb 14, 2013 08:33

Revision ID	Author	Timestamp	Comment
50498	Stu	13 Aug 04 12:03:19
46241	Dvellante	13 Mar 19 15:04:41
45591	David Floyer	13 Feb 14 08:31:36	/* Response Time Analysis */
45548	Wikibon Daemon	13 Feb 12 15:53:05
45529	Wikibon Daemon	13 Feb 12 15:06:01
45304	Stu	13 Feb 06 17:04:44
45303	Bert Latamore	13 Feb 06 14:41:25
45299	Leejohns	13 Feb 06 09:53:00
45297	Leejohns	13 Feb 06 00:53:25
45295	Leejohns	13 Feb 06 00:12:17
45294	David Floyer	13 Feb 05 19:39:16	/* VM-aware Hybrid Architecture Detail */
45293	David Floyer	13 Feb 05 19:32:30	/* Response Time Analysis */
45292	David Floyer	13 Feb 05 19:31:23	/* Response Time Analysis */
45291	David Floyer	13 Feb 05 19:31:03	/* Response Time Analysis */
45290	David Floyer	13 Feb 05 19:30:19	/* VM-Aware Hybrid Architecture Detail */
45289	David Floyer	13 Feb 05 19:29:00	/* VM-Aware Hybrid Architecture Detail */
45288	David Floyer	13 Feb 05 19:12:26	/* Storage Cost as a Function of IOPS */
45287	David Floyer	13 Feb 05 19:08:53	/* Executive Summary */
45282	David Floyer	13 Feb 05 17:53:28	/* Executive Summary */
45281	David Floyer	13 Feb 05 17:50:11	/* Storage Cost as a Function of IOPS */
45279	David Floyer	13 Feb 05 16:33:24
45278	David Floyer	13 Feb 05 16:14:32
45277	David Floyer	13 Feb 05 16:11:30
45275	Ralphfinos	13 Feb 05 14:46:56
45274	David Floyer	13 Feb 05 09:16:55
45273	David Floyer	13 Feb 05 09:16:21	/* Response Time Analysis */
45272	David Floyer	13 Feb 05 08:05:08	/* VM-Aware Hybrid Architecture Detail */
45270	David Floyer	13 Feb 05 06:51:31	/* VM-Aware Hybrid Definition */
45269	David Floyer	13 Feb 05 06:48:42	/* Executive Summary */
45260	David Floyer	13 Feb 04 11:32:21	/* Response Time Analysis */
45256	David Floyer	13 Feb 04 11:29:52	/* Response Time Analysis */
45255	David Floyer	13 Feb 04 11:28:52	/* Response Time Analysis */
45254	David Floyer	13 Feb 04 11:25:21	/* Response Time Analysis */
45251	David Floyer	13 Feb 04 10:30:16	/* Storage Cost as a Function of IOPS */
45250	David Floyer	13 Feb 04 10:29:01	/* VM-Aware Hybrid Architecture Detail */
45249	David Floyer	13 Feb 04 09:41:49	/* VM-Aware Hybrid Architecture Detail */
45248	David Floyer	13 Feb 04 09:26:10	/* VM-Aware Hybrid Architecture Detail */
45247	David Floyer	13 Feb 04 09:17:36
45246	David Floyer	13 Feb 04 08:25:36
45245	David Floyer	13 Feb 04 08:24:56
45244	David Floyer	13 Feb 04 08:12:13	Created page with '===Work in Progress - please do not modify or tweet until this banner is removed=== =====Executive Summary===== Wikibon has found strongly positive responses in the ...'

Wikibon is a professional community solving technology and business problems through an open source sharing of free advisory knowledge.

Become a Member!

Login

Featured Research

Announcements

Technology Events

Contents

Executive Summary

VM-Aware Hybrid Definition

VM-aware Hybrid Architecture Detail

Storage Cost as a Function of IOPS

Response Time Analysis

Comments on 'Hybrid Storage Poised to Disrupt Traditional Disk Arrays'

Post A Comment

most recent wikibon articles

latest wikibon blog posts

company profiles

wikibon community information