Thin provisioning Introduction
Thin provisioning, sometimes called "over subscription" is an important emerging storage technologies is thin provisioning. This article defines thin provisioning, describes how it works, identifies some challenges for the technology, and suggests where it will be most useful.
If applications run out of storage space, they crash. Therefore, storage administrators commonly install more storage capacity than required to avoid any potential application failures. This practice provides 'headroom' for future growth and reduces the risk of application failures. However, it requires the installation of more physical disk capacity than is actually used, creating waste.
Thin provisioning software allows higher storage utilization by eliminating the need to install physical disk capacity that goes unused. Figure 1 shows how storage administrators typically allocate more storage than is needed for applications -- planning ahead for growth and ensuring applications won't crash because they run out of disk space. In Figure 1 volume A has only 100 GB of physical data, but may has been allocated much more than that based on growth projections (500GB, in this example). The unused storage allocated to the volume cannot be used by other applications. In many cases the full 500 GB is never used and is essentially wasted. This is sometimes referred to as "stranded storage."
In most implementations, thin provisioning provides storage to applications from a common pool of storage on an as required basis. Thin provisioning works in combination with storage virtualization, which is essentially a prerequisite to effectively utilize the technology. With thin provisioning, a storage administrator allocates logical storage to an application as usual, but the system releases physical capacity only when it is required. When utilization of that storage approaches a predetermined threshold (e.g. 90%), the array automatically provides capacity from a virtual storage pool which expands a volume without involving the storage administrator. The volume can be over allocated as usual, so the application thinks it has plenty of storage, but now there is no stranded storage. Thin provisioning is on-demand storage that essentially eliminates allocated but unused capacity.
There are some challenges with thin provisioning technology, and some areas where it is not currently recommended:
- The data that is deleted from a volume needs to be reclaimed, which can add to the storage controller overhead and increased cost.
- File systems (e.g. Microsoft NTFS files) that use unused blocks instead of reusing released blocks cause volumes to expand to their maximum allocated size before reusing storage. This negates the benefits of thin provisioning.
- Applications that spread metadata across the entire volume will obviate the advantages of thin provisioning.
- Applications that expect the data to be contiguous, and/or optimize I/O performance around that assumption are not good candidates for thin provisioning.
- If a host determines that there is sufficient available space, it may allocate it to an application, and the application may deploy it. This space is virtual, however, and if the array can't provision real new storage fast enough, the application will fail. High performance controllers and and good monitoring of over-provisioning of storage will be required to avoid reduced availability.
As thin provisioning technology matures, applications and file systems will be built and modified to avoid these kinds of problems. The economic justification for thin provisioning is simple: it makes storage allocation automatic, which significantly reduces the storage administrators' work, and it can reduce the amount of storage required to service applications. It also reduces the number of spinning disk drives required and therefore will result in substantial reductions in energy consumption.
Action Item: Thin provisioning can provide some major advantages in increasing overall storage utilization and should be seriously considered when virtualizing a data center. However, users should be aware of the caveats and should examine the storage requirements and management of their applications to identify any that are poor candidates for this approach.
Footnotes: