Storage Peer Incite: Notes from Wikibon’s July 29, 2008 Research Meeting
MAID (Massive Array of Idle Disks) has yet to capture the market excitement that dedupe and server virtualization enjoy. Yet MAID offers a superior solution for storing data that is seldom used but, when needed, can be accessed reasonably quickly. For a variety of reasons the volume of data stored in disk farms is exploding. Much of that data is seldom if ever read, yet it is still deemed as needed -- perhaps for regulatory reasons or to meet discovery requirements in potential lawsuits, perhaps also to mine for potential business opportunities. MAID can preserve that accessibility and the ability to do random search through all that data while cutting energy costs dramatically, saving the organization money and decreasing its carbon footprint simultaneously. Potentially MAID could have an impact on the cost of burgeoning disk farms and could provide infrastructure for inactive tiers within data stores. G. Berton Latamore
Massive arrays of idle disks (MAID) are high-capacity, lower cost disk arrays for storing less active enterprise data and saving energy. The primary value proposition of MAID is lower operational costs, which stems from its capability to power down a portion of drives within the array, thereby lowering power consumption. Because spinning disk drives typically account for 80% of a storage array’s power consumption, MAID is, in concept, an effective technology for greening storage. Moreover, 90% of organizational data that is more than three months old, 70%-80% of all data in the organization overall, is inactive or never accessed.
The basic concept of MAID platforms is to group and store data based on access frequency, placing rarely accessed data on devices that are turned off. The concept of MAID was pioneered by Copan Systems, and early criticisms of MAID included concerns about shutting down enterprise disk drives, which unlike laptop devices are engineered to be always on. Mainstream manufacturers have begun to introduce MAID-like features into disk arrays, lending credibility to the concept, although there appears to be some debate about the exact definition of MAID -- a key distinction being an architecture designed specifically to accommodate shutting down a proportion of drives completely.
Two main issues drive MAID and spin-down adoption:
- Cost. Increasing amounts of data in the enterprise are ‘tier inactive’ candidates for placement on SATA devices. Historically, this information would be stored on tape, which consumes little or no energy, but the prevalence of SATA devices for tier 3 and 4 applications make it increasingly cost-effective to move data to an idle storage tier;
- Technology innovations.
Which technology innovations are noteworthy in MAID?
There are two main sources of technology innovation around MAID and spin-down: 1) hard drive manufacturers and 2) array vendors. Device manufacturers including Hitachi, Seagate, Western Digital, Samsung, and Fujitsu all have announced varying degrees of settings that are invoked through software commands.
In concept, users should consider five main points on the power management spectrum when evaluating MAID and spin-down solutions:
- Normal – for online active data – no power savings;
- Park – which parks the disk drive’s heads while the platters continue to spin – 20-24% power savings;
- Slow Down – a mode that decelerates the spin speed of the disk platters from 7200 RPMs to 3600 - ~50% power savings;
- Sleep – a standby mode – which can deliver up to 85% power savings;
- Off – shuts down the disk device – effectively 100% savings.
[Note: These figures are savings at the device, not the array level].
Recently, a number of array suppliers including EMC, Nexsan, Hitachi, and DataDirect Networks have introduced features within disk arrays that, through software, allow administrators to take advantage of some of these different power settings and apply parameters such as time-of-day on a drive-by-drive basis. Nexsan in particular offers a wide range of MAID levels. Copan has introduced intelligent software that proactively manages spin-up of idle disks (i.e. disk aerobics) to ensure devices are exercised and that data are migrated off devices that are high probability candidates for failure.
What are the main constraints of MAID/Spin-down?
The Wikibon community sees two main barriers to adoption with MAID and spin-down:
- The lack of good data classification practices. Candidates for MAID placement generally should be isolated from, for example, thinly provisioned volumes spread across many devices, which limit the effectiveness of MAID;
- The lack of MAID-aware file systems. Long delays mean application time outs. Today, there are few MAID-aware operating systems and file systems, except for specialized archive software and virtual tape library (VTL) solutions with specific extensions. Less aggressive power management feature levels, while not offering as great a power savings as MAID, avoid error handling/timeout problems and allow for wider adoption.
Advice for users
Spin-down generally and MAID specifically are maturing and increasingly becoming logical fits for VTL applications including backup and recovery, archiving applications, the inactive tier of a tiered storage management system, large sequential applications (e.g. scientific and entertainment), and even general purpose file systems such as NFS and CIFS (especially with power modes that are more spin-up friendly - e.g. Nexsan.
Organizations should think of tiered storage in multiple dimensions, including: 1) data value and 2) access frequency. Begin grouping data by I/O activity and ensure the existence of a ‘no activity’ group of data.
If aggressive exploitation of sleep or off mode is a goal to reduce power consumption, users should choose a MAID-aware application or solution — but in many cases, be aware that not all devices can be powered up simultaneously.
Organizations looking to manage less frequently accessed storage intelligently could in theory roll their own solution using robust archive software and low cost disk and green tape to improve efficiency. MAID is, however, a packaged solution that if deployed correctly can simplify the objective and lower integration risks, albeit at a price premium.
Action item: The bottom line on energy consumption is that getting rid of stuff is always the best approach to lowering the energy bill. Start by classifying data and setting policies to delete information that is not needed. Include an inactive tier that can exploit MAID/spin-down and consider storing more data on that tier, but be sure to remove older hardware in the process.
While vendors have done a great job of confusing the market, according to SNIA, Massive Arrays of Idle Disks (MAID) is “A storage system comprising an array of disk drives that are powered down individually or in groups when not required. MAID storage systems reduce the power consumed by a storage array.”
MAID systems generally fall into two categories: Those that can power all the disks at the same time and those that can’t. While the latter, with Copan being the leader, offer the most power savings, some users will find more comfort with the knowledge that, if needed, all the disks can be operational at the same time for, say, for example, a full data restore. The sweet spot for Copan-like MAID is applications or data that do not require performance in terms of IOPS or bandwidth and that need faster access to individual files than magnetic tape can provide together with associated energy savings. However, for sensitive time and large bandwidth operations including DR or full database restore, tape or regular disk can be a better approach.
The industry is evolving past the first generation of MAID systems adding intelligent power management (IPM) features that exploit the new power saving features now being introduced into enterprise-class drives and first implemented in laptop drives. So, users can have different levels of MAID with different power savings and response times. That said, most second generation systems still simply power off select drives. Why? Because it is difficult for a storage subsystem to classify the data. Nonetheless, users can expect more vendors, including mainstream suppliers, to offer tiered MAID storage.
When MAID technology is being considered, users must still classify their data and try to match it to the right MAID technology as follows:
- Align MAID technology, including MAID level, to the applicable tier and application needs.
- Understand the performance tradeoffs of using MAID as an alternative to traditional disk and tape.
- Look beyond energy savings and factor in cost of acquisition and site prep along with TCO and ROI.
- If using MAID for backup, investigate how much sustained throughput the system can handle – especially for restores.
Another opportunity to consider is to make applications storage latency aware. One recent example of this is a major credit card processor that stores 6-months worth of statements online but allows the customer to request statements that are older. That request takes up to 24 hours to fulfill. Clearly the older statements are on offline or nearline storage. If MAID were employed, the request could probably be satisfied in minutes or less, resulting in a better customer experience. Note that after 9/11, many users made the applications aware of the storage topography, i.e., that there is a mirror at another site, so it is not inconceivable that more applications will become aware of various storage latencies and even exploit them.
Action item: Given performance or other service requirements, not all storage or data applications lend themselves to MAID. However, given the compelling energy savings and asset life extension possibilities of MAID, users should consider it, albeit very carefully.
Classifying data is the mainspring for many IT initiatives, including those related to tiered storage generally and MAID specifically. There are two primary organizational issues with regard to planning for MAID: 1) Determining how much data are actually candidates for MAID placement based on access frequency; and 2) Determining the business value of building MAID awareness into applications. The former exercise should largely be handled by the storage administration group with recommendations made to management including a business case for MAID based on energy savings and other TCO factors. The latter is largely an application/user group discussion to determine, for example, the ROI of enabling services that are MAID-aware and consequently provide nearline-like access to data that are write once, read infrequently (WORI).
Action item: MAID is coming to the mainstream in multiple forms, both traditional MAID that isolates rarely accessed data and turns drives off, and in other gradations including a spectrum of power management features. Organizations should task the storage management group with determining the likely percentage of data that could reside on an inactive tier in the storage hierarchy and at the same time initiate discussions with application and user groups regarding the potential benefits of building MAID-awareness into applications.
The challenges of high-performance storage systems to help applications process tier 1 and tier 2 data still remain, and traditional MAID, where drives are powered off, is unlikely to help. However, the relative growth of tier 3 & 4 data is much higher. The lack of classification systems means that the data has to be kept even though a large percentage of it is rarely accessed or modified. This “Tier Omega” (Tier Ω) can benefit substantially from emerging MAID technologies.
There are many application that can currently take advantage of MAID technologies, such as archive, sequential applications such as scientific, geophysical and entertainment,backup and VTL, and some NFS & CIFS based applications The main barrier to the expansion of adoption of these power-saving technologies lies with the server operating and file systems. Unless the application “knows” that the data is coming from a power-managed disk, the system (or user) is likely to time-out. From a systems point of view, applications should be able to "know" that MAID is being used, and (for example) "wake-up" drives if necessary.
Action item: MAID-aware operating and file systems together with MAID-enhanced arrays can bring substantial energy savings. CTOs in organizations large and small should pressure developers in general and Microsoft in particular to implement MAID awareness for tier Ω, perhaps as part of a wider capability to interrogate where data is currently stored.
MAID technology was first introduced to the storage market in 2003 by Copan Systems. Since then, other versions of MAID have appeared. The original concept aimed primarily at the backup/recovery market as a tape replacement with the added values of energy savings, longer SATA drive life by spinning the disks less than 100% of the time, and price points that would bring disk prices much closer to tape. MAID represents a sound use of inexpensive SATA drives, and the technology and design points are solid. Five years later, MAID has made some progress but is well below the initial expectations for market penetration. So what's keeping MAID solutions from really taking off?
First of all, many user's still don't understand MAID, can't describe just where it fits, and have trouble identifying applications that are best suited for MAID. Just ask them. Also, overall market visibility for MAID remains relatively low compared to the far reaching and aggressive marketing avalanche we've seen the past few years for de-duplication and virtual tape solutions. The general lack of solutions kits, references, case studies, user's groups, media coverage, and data classification tools helps keep awareness levels too low.
The incumbent to the MAID market has been and still is tape, and tape vendor marketing efforts have been decreasing in the past two years. Are they about ready to wake up? MAID vendors need to market MAID much more effectively and take advantage of the extended tape marketing siesta by providing webcasts, white papers, road shows, analysts briefings, references, testimonials, customer events, etc.
Action item: The MAID market has been intact for five years, and market penetration remains below expectations. With the tape vendors steadily reducing marketing activities, MAID vendors should seize the opportunity to aggressively attack the storage market as the de-duplication and virtual tape providers have done before it's too late. For MAID, success is no longer about developing an effective solution, its about selling an effective solution.
As part of the data classification planning exercise associated with sensible MAID initiatives, organizations have an opportunity to eliminate two things: 1) data that is unnecessary and has no business value and 2) older hardware. As the Wikibon community has often stated, the only way lower the energy bill is to unplug hardware and get rid of stuff. As well, the root cause of inefficient storage is not poor technology but rather inadequate data management practices.
Planning for MAID provides an opportunity to get rid of unwanted information, implement sound retention and data migration policies and consolidate (i.e. unplug) outdated hardware.
Action item: Don't allow MAID to be a band-aid to avoid addressing root cause storage and energy efficiency woes. Technologies like MAID may buy some time, but ultimately users need to classify data, destroy information that has no business value, develop and institutionalize retention policies, and implement good tiered storage management and archiving software.