Storage Peer Incite: Notes from Wikibon’s July 10, 2007 Research Meeting
This week Wikibon presents Tape: Not wrapped up yet. Despite 20 years of discussion characterizing it as a dinosaur facing extinction, tape is still with us, and it is not likely to go away any time soon. The reason is that tape remains the most cost-effective solution to the business needs of data backup, off-site storage, and, in the case of disaster, emergency restoration for several reasons.First, tape can be easily handled, transported and stored off its drive, while removable disk is much more delicate and prone to damage. Second, tape has a longer effective lifespan than any form of disk, including WORM, available today, so that data can be put on a tape and recovered 10-50 years later, while data on disk has to be moved to a new disk every five years. Third, because of the fragility of removable disk, non-tape, off-site storage solutions depend on high bandwidth network connections to move very large volumes of data. While network costs have dropped steadily, the volume of data being stored and, therefore, in need of backup, has grown equally quickly, making this an expensive solution. Fourth, since the 1990s tape has had more capacity than even the largest disks, so that today one tape can hold the data backed up from several disks in most shops. Finally, again because removable disk is fragile, most disk solutions keep the disks on drives, which then have to be powered, while tape can be put into cases in a vault, where it uses zero power. As the cost of power continues to rise, its impact on the IT budget will only grow. For all these reasons, despite the widely held belief to the contrary, we firmly believe that tape is here to stay in medium-to-large organizations. Bert Latamore
Twenty years of talking about the demise of tape becomes a habit, yet large- and medium-sized organizations find tape to be a viable technology option for many data management challenges. For organizations that need very fast recovery, tape remains a key technology that is unlikely to go away soon. The promise that bandwidth would replace trucks for moving large amounts of data has not come true because the amount of growth in the data that needs to be backed up exceeds the rate of growth of bandwidth, and will continue to for the foreseeable future. This gives tape a financial advantage over disk as an off-site backup technology that is compounded by the lower energy cost of tape versus disk over the lifetime of the stored data. Tape also has a much longer storage life than disk, eliminating the need to move data to new media during its lifetime. Organizations that need a cheap copy of data in a “write once/read rarely” mode also are attracted by tape economics that show a 3X to 100X price advantage over disk technologies as a viable approach of sustaining their data.
One primary reason that tape continues to be viable is that since the mid-1990s the tape industry has delivered media exceeding the capacity of the largest disk. That changed the dynamic associated with backing up volumes (one tape could backup multiple disks rather than the reverse). Consequently tape’s advantages in terms of remote storage, price and its association with legacy applications that are designed to use it rather than disk for managing archiving, continue to exceed the issues of finding specific files, physically moving and securing tapes, and the performance limits associated with serial access.
In the last few years we have started seeing some vendors push hard to advance truly integrated tape and disk library technology (as opposed to pure disk “virtual tape” solutions) outside the mainframe world to minimize tape’s issues while maximizing its advantages by delivering automated 90/90 solutions (90 days on disk/90 years on tape). These products show such significant promise for optimizing the cost/benefit ratio for long-term storage administration, making the biggest question in tape’s future not its viability but whether the tape vendors will step up to drive technology forward.
It is clear that the application needs for tape are in place. What is less clear is whether storage suppliers will continue to be seduced by “disk only” product lines that are easier to market but don’t necessarily solve the core problems of backup, restore and solid data movement and administration.
Action Item: The practices and architectures associated with tape are becoming issues in today’s market. Users should continue to focus attention on data classification, deduplication and other architectural practices associated with backup and archive. They should also push vendors to find ways to deliver high-quality tape solutions that meet the real business needs of storage administration and data archiving.
IBM presented a model today during its call to announce new tape products that stressed that tape is greener than disk by a wide margin and tape innovation continues, despite pressure from emerging disk technologies. The IBM model also showed tape costs are comparatively lower than disk. IBM analyzed a 250TB data requirement growing at 25% per annum and demonstrated that over ten years:
- A SATA disk configuration costs $6.4M (excluding labor costs)
- An LTO tape library costs under $1M (excluding labor)
- A blended disk/tape combination costs about $2.3M (excluding labor)
However, the innovations of of MAID, disks in a cartridge, removable disk/SATA hybrids and VTL are taking the TCO of disk closer to that of tape. While tape libraries currently enjoy a lower $/GB and OPEX than any type of disk, the predominant use of tape is in data centers and the upper-end of the SMB market. Tape has effectively been eliminated on the desktop, very low-end servers, low-end SMB applications, in the car or home. The total available market for tape appears to be shrinking as the reach of disk is expanding.
Nonetheless, the media life of new tape is now 15-30 years (versus 5-8 years a decade ago). SATA disks have a useful life of about five years before migration and conversion are needed. Tape is the better long-term archival media based on cost, media life and ease of portability. Notably, tape consumes orders of magnitude less power than disk.
While backup applications are moving to disk, archival, fixed content and compliance applications are still best economically suited for automated tape. Tape clearly is more than backup. The tape industry will need to answer the de-dup function with some other type of data reduction scheme and open systems VTL's will adopt de-duplication technologies in the near term.
So where does this all leave tape? The future of tape includes disk, that is a disk array that front-ends an automated tape library. In this instance, users get the performance of disk for more active data and the economics of tape for the larger amounts of archival data with data movement occuring directly from disk to tape without involving any server or appliance. This links tier 2 and tier 3 storage together where migration can, in theory, be automated.
Action Item: When it comes to tape decisions, consider applications before anything else. Long term T3 storage (archival, compliance and fixed content apps) will remain the domain of tape as the economics are more favorable. A blended disk / tape approach is emerging as a likely reference model for T2-->T3 storage migration and IT should prioritize bringing mainframe-like processes to manage the movement of data from online to nearline to offline.
A decision to rid a data center of tape technology requires three sets of actions. First, tape hardware must be exited. Second, tape-based backup/restore software must be exited. Third, data on old tapes must be migrated. Organizations convinced that the costs of tape hardware (performance of any sequential access technology, administrative and security complexities associated with removable media, etc.) so outweigh the benefits (cheaper than dirt media, low transportation costs, longest media life, etc.) can replace their tape hardware with virtual tape libraries (VTL). SATA drive-based VTL products typically are highly, if not completely, compatible with leading backup/restore system software, utilizing simple volume mapping technologies to present applications with common tape formats, controls, and administrative tools; they mitigate the performance and administrative complexities of tape, but typically must operate within the same metropolitan region as the data center, which undermines disaster management edicts at most large shops. Moreover, VTL products do not rid an organization of tape-orientated backup/restore software, which can still be very expensive and cumbersome to operate relative to more "modern" software packages. Finally, data on tape media do not magically "jump" from old tape transports to new VTL targets. The data must be moved, which will be a laborious process fraught with opportunities for human error.
Action Item: Be wary of efforts to get rid of tape for the sake of getting rid of a class of old technology. Focus decisions on the complexities and success scenarios of migrating data and backup/restore software tools (very hard), and not the relatively easy actions required to replace tape transports with VTL disk.
Tape is cheapest medium to store data long term, is environmentally very friendly, is easily transportable, has the lowest RPO and the highest bandwidth (when physically transported on the highway). The tape of the future will require the integration of disk and network technologies to enhance the platform.
When that future will be is uncertain, as tape manufacturers are struggling to re-gain their marketing moxie. While the tape market sorts itself out, it would be tempting for users to try and implement their own solutions integrating tape and other solutions. This temptation should be resisted in large data centers, as should any attempt to phase out tape. Rather, good practice from the mainframe world should be implemented in the open world.
Action Item: Storage executives should resist attempts to integrate tape and disk technologies, and wait for vendors to provide solutions. In a time of flux, it will be unclear what solutions will survive in the marketplace. Storage executives should not be afraid to implement different solutions for each storage pool.
There is a rapidly growing segment of data that needs to be available 24x7 to all the stakeholders of an organization, customers, partners, suppliers, and community. This data needs to be optimally placed, balancing user response time and bandwidth usage against the cost of holding multiple distributed copies.
File systems such as those from Amazon (Amazon S3) and Google (GFS) spread the data over the network with multiple servers, storage devices and data centers, using commodity hardware and software. The file system is designed to expect failures from the technologies and the software, and to be able to recover from any such failure.
If this technology is solid, does this obviate the need for this class of data to be backed up? Time will tell if tape is dead for this class of data, but if it does, it introduces a significant potential saving in storage costs and complexity. What is clear is that there are many new storage topologies that will be possible with different combinations of storage and network technologies.
Action item: It will take some years before the market decides what combinations of storage topologies are optimum, as vendors and customers try different approaches to melding file system, disk storage, tape storage and network technologies. Organizations should ensure that experience is gained with multiple storage approaches, and that they are tested at all stages from application design to operational implementation. Storage needs to be organized to encourage different approaches, including outsourcing storage to third parties.
With STK gone who will be the new "King of Tape?" While IBM looks to be the early favorite, there is no clear innovator or leader in the space who makes tape its top priority and funds the business accordingly. At IBM, services remains #1 and at Sun (now Oracle), it doesn't appear tape is on a par with servers, Java or battling Windows. Despite HP's approxmiate 20% marketshare, generating demand for ink is its main goal, far surpassing any potential interest in dominating the tape market.
Companies like EMC and NetApp, who make virtually all their revenues from disk-related technologies and services, will continue to try and position tape's death as imminent, however they are not likely to succeed any time soon. Tape vendors must fight this challenge by articulating clearly that the future of tape includes disk and creating direct pathways between T2 and T3 storage. Specifically, the ability to bring together T2 and T3 strategies with data movement approaches that don't invoke servers or appliances.
Tape vendors who have data center experience with T2-->T3 migration strategies can bring substantial credibility to the table, drawing on years of experience helping customers solve a variety of problems related to backup, remote replication, archiving, compliance and fixed content management. Tape vendors, however must address data de-duplication directly or with other data reduction strategies to preserve the backup market. This includes placing de-dupe function in VTL's and siphoning data off to tape while the VTL controller manages 'house cleaning' items.
Acton Item. Bringing mainframe class automation to T2-->T3 migration and open systems is a viable business opportunity. Tape vendors must fill this void or continue to be marginalized by clever disk marketers.