Originating Author: Nelson Nahum
One of the main existing legacies today from an era when servers stored their data in direct attached “hard drives” is that OSes and applications are created assuming that the drive is “hard” — that is with fixed capacity. Even though for the last decade most servers and applications have stored their files in array subsystems that are flexible in size and present Logical Units (LUs), those LUs are still treated as fixed size units or Physical Units by most software.
There is no reason to expect any change in this any time soon. This creates a burden with very high hidden costs for IT divisions of companies with medium to high numbers of servers. Thin provisioning ameliorates this huge cost.
This article describes how thin provisioning works and what benefits it delivers as well as the facts that need to be taken into account for a successful implementation.
What is Thin Provisioning?
Thin provisioning is a storage management feature that presents the OS with a virtual drive with very large capacity even though the actual capacity of the drive is much smaller. For example, a storage system can present a 2 terabytes (TB) LU, while in reality only few GBs are allocated, and most of the space is virtual. As the application starts writing more and more data, the real storage space is expanded on the fly.
If the application reads an area that has never been written before (and thus without real storage), a thin provisioned volume will return a buffer filled with ZEROES.
One of the obvious advantages of this is that there is no need for manual steps to expand the allocated storage, since it expands automatically. A less intuitive advantage is that this method allows users to reach nearly 100% storage utilization. This is because every real space allocated corresponds to real data which is written by the application.
Another important advantage of thin provisioning is that when it is combined with remote mirroring it saves disk space on both sides of the mirror in addition to saving substantial bandwidth required for building the mirror. In the initial building of block-based remote mirroring, the full volume is transmitted over the WAN. Having a thin provisioned volume allows the administrator to transmit only the real used space rather than the full volume. This can drastically reduce the time of building the mirror and enable the use of remote mirroring using low cost communication lines.
How Much Money Can Be Saved Using Thin Provisioning? Boost in Capacity Utilization The most straight forward way to calculate how much money thin provisioning can save is by measuring the boost in capacity utilization. Simply look at the current capacity in use for a particular application, then mount the same application on thin provisioned volumes and recalculate the actual capacity utilization. Then calculate the savings by multiplying the difference with the amount of money you spend on storage yearly. For instance, if thin provisioning boosts storage capacity utilization from 60% to 80%, it saves 20% of the annual storage cost.
IT Costs and Efforts in Expanding LUNs For the majority of applications, no matter how much spare space was pre-allocated, at some point that space will reach its end. Typically volume size needs to be expanded every two or three years, and, depending on the application, operating system, and storage array capabilities, this operation could take a few hours of intensive work by the IT operators. Having to extend each volume once every three years means that in environments with more that 150 volumes, the expanding process will occur once a week in average. Thin provisioning eliminates this effort.
Costs Associated With Power and Cooling Storage arrays are very high consumers of power and require intensive cooling resources. If thin provisioning boosts the storage utilization by 20%-30%, then the same power consumption supports more data. This should obviously be counted when calculating the ROI.
Special Considerations As with any new technology, thin provisioning can introduce new challenges and problems that need to be considered.
The immediate problem is the possibility of reaching maximum capacity. The application thinks that there is a larger empty space available, but in reality there is much less physical storage ready for use. A possible scenario is when the actual capacity reaches its end, access to the volume will then be blocked until more capacity is added.
This shifts the notion of “end of capacity” from the application servers to the storage. The good news is that the application servers don’t need to deal with this. However, the storage administrator does now have the task of following storage growth and making sure enough physical storage is available.
A good system should provide enough information to make it easier for a storage administrator to predict storage growth and know when he should add more storage and be able to predict the capacity growth.
Another new challenge is the fact that not all file systems and databases reclaim deleted space. Some of them prefer to always allocate new space rather than reuse space freed by deleting files. This may cause a problem if an application constantly creates and deletes files because the thin provisioned system will continue growing, asking for more physical space while most of the space freed by file deletions is not used. Best practice therefore is to test each application with thin provisioned volumes before implementing them in the production environment.
Users should also ask the vendor if about any available tools to measure lost space and procedures to reclaim the wasted space.
Depending on the implementation, a thin provisioning system can impact performance. This is due to the need to manage large tables and possible file fragments. Users should ask the vendor to supply performance benchmarks on full provisioned versus thin provisioned volumes.
The Future of Thin Provisioning Thin provisioning will become a standard option on all modern storage systems. As it becomes more common, OSes and applications will take it into account and will optimize their storage allocation mechanisms as well as the information provided to eliminate some of the problems referred to in this article.
Server virtualization and other technologies will accelerate adoption of thin provisioning. In a VMware environment in particular, every virtual machine needs to have its own boot and data volumes. The size of the boot volume is dictated by the size of the OS and booting software similar to the size of the memory to accommodate memory dump files when an application fails. In general, users allocate a minimum of 4-to-8 GBs for booting, while the real data is less than 1GB. In a system with hundreds of virtual machines, TBs of expensive storage are wasted. Thin provisioned volumes allows administrators to give the OS 8GB of apparent space while in reality allocating only 1GB. In case of an application failure, the remaining storage will be allocated once the memory file is created.
Another technology which will surely accelerate the adoption of thin provisioning is flash-based solid state disk. Because of their greater cost, users want to raise their utilization to 80% or 90%. Since solid state disks don’t suffer from fragmentation performance issues, they mitigate some of the issues involved with thin provisioning on spinning disks.
Thin provisioning provides the opportunity to move storage resource management (SRM) from the application and file level to the storage system and block level. One of the reasons why SRM software wasn’t successfully adopted was the need to measure the used space at the application server and file systems level. In environments with hundreds or thousands of servers with different OSs and file systems, installing and maintaining the SRM softwre became a nightmare. Furthermore, SRM lacks visibility at the block level, so if a file system had 1 TB used by the array, replicating the same data three times, the SRM software couldn’t tell that those were 3 TBs used.
When volumes are thin provisioned, the allocated space is measured at the block level rather than at the file level. This has the advantage that it can be measured in a central location, independent of the amount or type of application servers. Moreover, it can take into account other replications which take place at the block level. A new generation of SRM software, that are useful will emerge as a result of thin provisioning.
Action Item: Thin provisioning is becoming very popular as it offers substantial savings in costs and time resources. It is also positioned to solve the complexity of current storage infrastructures. As this article indicates, not all applications are suitable for thin provisioning, and it can introduce new challenges. Nonetheless, as this technology becomes more widely adopted and new adjacent technologies emerge, those challenges will prevail and make thin provisioned volumes the only way to allocate storage.
Footnotes: