The announcement of EMC’s FAST Version 1 storage management software (Fully Automated Storage Tiering) has put a spotlight on this function. ATS offers on-line migration of data without downtime (completely transparently to any application). Many solutions are already in the market-place, with more coming.
The fundamental business value of an Automated Tiered Storage management (ATS) system is being able to demote data to a lower tier safely and reduce storage costs. Safely means that when a data volume is demoted, the applications using it still function within their SLAs, and users are happy. On an exception basis it may be necessary to promote data to a higher performing tier.
An Automated Tiered Storage management system has two components:
- An ability to move volumes dynamically within (or ideally between) arrays. This is usually facilitated by a virtualization layer that separates logical and physical constructs.
- A software overlay to set policies, gather and store information, execute those policies, and monitor success.
- A small amount of additional storage space to execute the transfers
The ability to move data dynamically and completely non-disruptively has been available for many years within arrays, with some solutions offering between-array migrations. However, the work to move arrays manually is time-consuming and risky, and there is not much upside for storage administrators. The availability of software to automate the process is important to reduce the storage administration effort and minimize risk of failure. A small
Measuring ATS Success
Success in automated tiered storage management can be measured by two metrics:
- Cost savings achieved when data is successfully demoted to a lower tier, allowing the substitution of less expensive media in new equipment purchases (# lower tier disks bought) - Goodness is avoiding the purchase of a high number of low capacity/high performance drives
- Number of data volumes promoted to a higher tier, implying that SLAs were not being met and users were unhappy (% of volumes promoted)- Goodness is a very low percentage.
The second metric is particularly important in an automated environment. IT professionals do not like ceding control, and users are initially suspicious of automation and black magic. IT management should focus on establishing confidence that the system will “do no harm” as a premise for widespread adoption.
It is also important to understand the limitations of tiered storage management. ATS does not provide real-time management. Non-disruptive movement of a large volume consumes significant resources over a significant elapsed time. I/O subsystems always have second-to-second, hour-to-hour, and day-to-day variations in contention as the application demands change. Some have some degree of predictability (e.g., month-end processing) – some are highly unpredictable. Storage arrays provide real-time storage optimization to minimize the impact of this type of contention. For example, cache optimization in the storage array is real-time. These well-tested algorithms help utilize an expensive resource efficiently and improve I/O speed and response time. If flash technology were part of a storage hierarchy (for example as a store-though persistent cache as an extension of the array cache), movement in and out should be done by the array optimization functionality in real time.
ATS is not an archiving system. An ATS works with volume metadata to determine the frequency and type of access to data, but that is insufficient classification for an archiving system.
However, the movement of volumes between any of the tiers (flash drives, FC drives or SATA drives) should rarely be done in reaction to a real-time problem. One of the limitations of automatic movement is hunting. An application works for a time with very little I/O, and then a single event happens which increases the I/O dramatically. The automated movement reacts by moving the volume after the problem has happened, the application reverts to its normal I/O activity and gets moved out again later.
ATS vendors have started to include sub-LUN movement of data within volumes as a feature, identifying blocks of data that are high activity and could be moved to a higher storage tier. This is a new technology and there is not enough information available or user experience to guide how this should be approached. This is potentially useful for blocks that could reside on a small amount of SDD storage. Time will tell whether this functionality should reside within the ATS software or be integrated as part of the storage controller OS and ATS or within the storage controller operating system. Wikibon will update this analysis when more information is published from vendors and more user experience becomes available.
Cost Justification of ATS
The cost justification of ATS is not complex. The cost elements are:
- The cost of software for the array ($2,000 entry cost, $5,000 for a small array, $50,000 for a large enterprise array),
- Implementation costs (set-up of policies, etc) - $2,000-$10,000 per array,
- Software maintenance costs (~20% of purchase price/year),
- Ongoing monitoring and reporting (a few hours/week),
- Total three-year cost for a small array - $12,000,
- Three-year cost for a large array - $120,000.
The benefit elements are:
- The reduction in disk price for planned disk purchases. A rule of thumb is that an additional disk on an array costs $2,000 + 20% maintenance. Replacing a two planned FC disks with one SATA disk can achieve a saving of at least 50% in the cost of additional storage. The impact of additional storage space to execute the data transfers is included in this savings estimate.
- An ATS does not save on I/Os, so the remaining costs of the array do not change significantly.
Figure 1 below shows the financial analysis for implementing ATS on a small disk array (100 disks). It shows an breakeven of about 16 months, and an ROI >120%. The risks for an ATS project are small - The software can be taken out and and the processes can revert back without much difficulty. The probability of project success is good. The business case for larger arrays would be close to linear according to the number of disks.
Break-even for a small array is buying 12 SATA drives instead of 12 FC drives, and for a large array 120 SATA drives. This is supported by the experience from current ATS users, who report that after an ATS implementation, the majority of future disk purchases are SATA drives.
Cost justification for ATS software for arrays that are fully populated or for arrays that can only have one type of drive, or on arrays that support a single application, is unlikely. Low growth rate are a contra-indication for deploying ATS technology. Users should be very wary of pricing linked to the number of terabytes installed, as this could negate the benefits of using very large capacity drives.
ATS Systems Available
The following table is a review of the Automated Tiered Storage Systems that are available in the marketplace.
Wikibon has talked to Compellent and Hitachi ATS users in depth, and in general users are very happy with the results. Our main finding was that after a time, they let the system take over, and the most important benefit was being able to upgrade the arrays with only (or mainly) high capacity drivers.
Guideline for ATS Implementation
The main focus of best practice for in-box tiered storage management is creating a step-by-step process that will allow volumes to be demoted safely.
- Keep the Tiers simple: The number of tiers should be small (2 – 3 tiers), and the definitions simple. A typical three=tier system could have 15K FC, 10K FC and SATA. If flash drives are available, then it is probably sensible to limit the tiers to Flash, FC and SATA.
- Keep the evaluation period before volumes are migrated down long – very long. It is good to keep the evaluation period over at least a month-end, so that data supporting mission-critical applications are not demoted.
- Run the movement part of the process at a quiet time for I/O on a batch basis – once or twice a month should be sufficient in most cases. Special runs for promotion may be needed, but if so it is a strong indicator that the policies have been set wrongly.
- Ensure that any volumes that are needed very infrequently but where high performance is need when they are run are excluded from the automated tiered storage policy. Examples of this include recovery procedures, end-of-month/quarter applications, and in particular end-of-year applications.
- Make the initial determination of where volumes reside simple. One policy could be that all new applications start out on FC storage, and then migrate down to SATA as and when the ATS software determines. This follows the natural progression that access to data is at a maximum during the first few months of its life and then declines rapidly.
Action Item: Automated tiered storage Solutions have been implemented for a number of years and are now supported by many vendors. If considering implementing ATS on newly available software, users should implement a pilot project and be cautious about including mission critical arrays until the software has matured. Users should determine the cost of implementation per array and the expected savings in additional storage costs over the life of the array. Good tools and processes that monitor success are essential to long-term adoption.