Editor's Note: On January 14, 2008 EMC announced Virtual Provisioning, its version of thin provisioning. The advice in this note applies to customers of EMC's DMX as well.
A previous alert (thin provisioning: look before you leap) specifies some of the potential pitfalls of thin provisioning. This alert attempts to look at these issues from the point of view of how to avoid the pitfalls, the start of a “how-to” guide to safe provisioning. Please add additional virtualization issues found from practical experience, and how to avoid or minimize them!
Planning for a virtualized storage infrastructure with thin provisioning means that new sets of skills have to be learnt, and care taken in adopting it. Like any powerful technology, it can hurt and heal. Below are some issues and how to minimize the problem. Please add many more!
- Application Support
- Many ISV packages have specific configurations that are recommended or sometimes mandated. As thin provisioning is new, it is sometimes not included as one of the certified technologies supported. Many of these ISV certification restrictions (often very limiting) are routinely ignored anyway, on the basis that saving money and asking forgiveness later is the lowest cost and least hassle strategy. Other CIOs have refused to sign any more ISV contracts unless they support thin provisioning. At the least, infrastructure standards should be updated to required thin provisioning support of new ISV packages.
- Database vendors have been on the whole quick to support thin provisioning. For example, Oracle has shown how the integration of its 10g database software with virtual volumes, automated placement and thin provisioning can make systems simpler to setup, grow, and monitor. Storage administrators should be careful to check that parameters that (for example) write formatting blocks on all storage should be changed if possible.
- Development and test is a good place to start thin provisioning. Historically, the waste of storage in this area has been very high, and storage utilizations of 20% or less are common
- Performance:
- One of the results of a successful thin provisioning initiative is a saving of disk spindles and gigabytes, and with it the cost of storage management software, environmentals, and maybe some people. The results of this is an overall increase in access density, I/O to the disk. As a result, the storage will sometimes have to be re-balanced. One of the most effective ways of re balancing is inherent in the way thin provisioning is done. Thin provisioning storage pools are made by wide striping across a large number of disks. The level of skew in I/Os to disk in the original volumes and disks needs to be looked at carefully. If there is high skew (i.e., a few volumes have most of the I/O and a large number have low I/O rates), then the effect of wide striping will be to spread the I/O more evenly across the storage and will probably compensate for the higher access density. If there is low skew (i.e., the volumes have been highly tuned and I/Os spread evenly), then the wide stripping will help, but there will be the need to add more actuators by (for example) using lower density disks. Careful monitoring of performance after the migration of application to thin provisioned storage is required; one of the advantages of the virtual environment is that migration to higher performing storage will be much easier.
- When external storage is used for virtualization and thin provisioning, there is an extra element added to the data path. In the case of Hitachi, it is the USP V or VM. This adds to I/O transfer times (<1ms, according to Hitachi). It also adds the cost of the USP paths and cache to manage this transfer. However, the reports from users have often found that performance has actually gone up, as there is more cache in the storage controller. Again, performance should be monitored, and volumes that especially sensitive should be migrated to internal storage on a USP or equivalent device.
- There is little experience of using thin provisioning on external storage (some intrepid users have tried it out anyway, and have been using it without official support). It is important that the configuration of internal and external drives, and the setting up of the virtual storage pools be reviewed with Hitachi, Sun or HP before implementation starts.
- Availability
- Many operations managers do not like adding additional devices between the application and its storage; “another device that the cleaners or electricians can accidentally turn off,” as one storage administrator once said. There is a small lowering of availability from a hardware standpoint of view, but potential increases in availability are possible because of common processes and procedures for storage management software. The approach should be to look at the RPO and RTO requirements of the application, and if it is justified, bring the volumes into the USP V as internal storage.
- All storage can fail even with RAID. Application recovery has to be designed to avoid data loss if that happens, commensurate with RPO/RTP business requirements. For example databases can restore previous back-ups and apply the logs to recover the data. With thin provisioning, there are some special considerations. By placing many volumes on the same virtual pool, the I/Os will be spread across multiple disks. Should there be a failure of multiple disks, more volumes will be affected. For applications with high RPO/RTO requirements, it is important that log files and data files from the same application not be using the same virtual storage pool. Where available, RAID 6 would be the preferred choice of storage protection, and vulnerable data should be migrated to arrays that support RAID 6.
- Management of the virtual storage pools is important, as failure to increase the size of the storage pools in time could cause all the applications to fail. Organizationally, allocation of storage should be centralized to one group, usually found in the storage administration group. This group should monitor actively the amount of over-allocation in virtual storage pools, and the storage system alerts when usage thresholds are reached. Sometimes additional storage will be need fast to allow growth of the virtual storage pools; storage acquisition procedures need to be revised and streamline to be able to react appropriately.
- Cost considerations
- As described above, performance requirements may demand that you utilize a larger number of smaller (and potentially faster) disk drives to maintain adequate performance
- It may be appropriate (or necessary) to put database transaction log files on thin devices in dedicated storage pools to avoid availability risks. Certain types of log file that overwrite historical logs on a FIFO basis should not be placed on thin provisioned volumes. Most log files grow at predictable rates and are good candidates for thin provisioned volumes, if the performance of the pool is sufficient. As with traditional storage allocation, under no conditions should database log files ever share the same drives or storage pools as the actual databases they support.
- Depending on the availability (RPO/RTO) requirements of the applications, storage that uses RAID 5 may need to be upgraded to RAID 6. The parity overhead of RAID 6 is twice as much as RAID 5 (e.g., 6+2 instead of 7+1), and there may be performance implications for write-intensive applications.
- In some circumstances, application performance and availability objectives may require that RAID 6 storage be upgraded to fully-mirrored (RAID 1 / RAID 10) for thinly provisioned volumes to meet the RPO/RTO requirements.
- If you plan to use local and/or remote replication of thin devices as part of your business continuance and disaster recovery strategy, be sure to verify the supported configurations and space requirements of your vendors' implementation.
- If using local or remote replication for thinly provisioned volumes on externally-virtualized storage, additional cache that may be required in the storage controller.
- Ensure that vendor-quoted response time overheads for external storage (such as the aforementioned 1ms for the USP-V which refer only to non-replicated, non-cached external volumes) are applicable for the target configuration.
- Review the backup and recovery plan for thinly provisioned volumes, particularly for externally virtualized thin device storage and any RAID-5 protected storage pools is appropriate for the additional risk of multiple volumes failing.
- In addition to improved utilization, thin provisioning can also simplify and shorten the time required to perform storage allocation and expansion. A policy of close monitoring and coordination between the storage admins, database admins and server admins should be established to ensure that sufficient unused physical capacity is installed and available to support the expected (and unexpected) data growth rates and/or automatic database extensions.
Action Item: A detailed plan for the movement of arrays from standalone to being attached to a Hitachi (or distributors) high-end array should be made and reviewed. Performance monitoring should be done, particularly for early migrations. As much as possible, internal storage management staff should plan and perform these migrations, so that they understand the trade-offs that have been made, and increase their skill levels and comfort in what is an exciting new technology. The same people should form the core of a centralized service to allocate storage across the data center.
Footnotes: