Contrary to popular belief, how you archive matters more than what or why you archive. For the broad market, the notion of non-archived data has become antiquated. Getting rid of old data means taking the time or investing in resources required to decide what data can be deleted, and most data managers do not feel comfortable making those decisions. So today virtually everything is being stored forever, generating huge repositories of data and content, and creating a great urgency to establish a data storage architecture that will thrive in this new “store everything forever” era.
Disk and Tape Uses Now Overlap – Providing More Choices Questions of what and why to archive mattered a lot more when content owners and data managers had to make substantial tradeoffs between primary storage and archive storage. Disk was used for primary storage and tape was used for archive. Primary disk was very expensive and tape archive imposed a significant management burden and limited access to data. Recent technology developments have changed this equation. Disk storage is no longer cost prohibitive for long-term storage. Tape solutions are no longer management-intensive, nor do they limit data access. However, it has to be the right disk or tape solution for these statements to be true and to make good sense for today’s archive usage.
Overarching Characteristics of Modern Day Archive The latest technology developments can have an enormous positive impact on the efficiency of an archive approach, provided one keeps the following requirements in mind:
- Low-cost storage. Cost savings is still a key motivation for pursuing an alternative to primary storage.
- Data durability. Archive data must be well protected, and the need for durability encompasses site disaster as well as storage component failures. Archive is about retaining data, not about moving it and losing it.
- Easy data access. Archive data must be easily accessible – if not, then why bother?
- Unlimited scalability. Today’s architecture needs to scale easily to realize the cost savings.
- Non-disruptive technology migration. Solutions must be able to migrate non-disruptively to new component technologies as they emerge, thereby providing long-term benefits and cost savings from today’s investment.
New Technology Features that Optimize Archive Storage There are several hot new technology options available that must be considered when an archive strategy is being implemented:
1) Erasure Code-Based Object Storage. The greatest thing since sliced bread in the disk archive food chain is the advent of erasure code technology, which effectively creates data overhead to protect data in the event of a component failure, similar to how RAID technology adds parity as overhead. Erasure code is different from RAID, however, in the sense that erasure code technology adds overhead in the form of data dispersal whereas RAID technology operates on a fixed set of hardware components. Erasure code’s dispersal algorithms translate individual files or objects into many data elements, each of which carry a small amount of redundancy so that the user only needs to get back a portion of the data elements to retrieve the complete object. When erasure code-based object storage is deployed, data is naturally protected against hardware component failure without the need for replication. In addition, when object storage is spread across multiple sites (referred to as “geo-spreading”) data is further protected from site-level disasters, also without the need for replication. Because data replication is not required, much less hardware is needed to store and protect data. This is fundamentally why data storage that leverages erasure code technology can deliver substantial reductions in hardware cost. Additionally, because less data is under management, software costs are likely to decrease as well. Finally, because erasure code addresses hardware failures at the component or drive-level as opposed to RAID’s approach that deals with failures at the cabinet or enclosure level, it is easy to see how updating components to new technology in an erasure code-based environment does not require a disruptive forklift upgrade.
2) LTFS and NAS Tape. Word is getting out about LTFS (Linear Tape File System). This technology was introduced in 2010 and enables entire new use models for tape. LTFS technology offers a complete self-describing file system on a tape cartridge, and this allows users to read and write data to tape just as if it were an extension of their file system. Users can literally drag files to a tape cartridge and never again worry about having to use a proprietary backup application to get data to and from tape. There are several solutions on the market today that enable a large tape library to be accessed as a NAS share – how much easier can access to data on tape get? A large and increasing number of software solutions support the LTFS format today, and because LTFS is being driven as an open standard in the SNIA organization, LTFS tapes are well-suited for long-term archive applications as open standards are more likely to be read by systems of the future. LTFS software enables a whole new level of access and portability for data on tape.
3) Data and Media Integrity Checking. The revolutionary features related to data durability in the tape world are data integrity checkers. There are a few offerings that enable the user to set policies that dictate how often a tape cartridge should be rotated into a drive to test the integrity of the media and the data on that media. This is like rotating wine bottles for long-term keeping. However, unlike rotating wine bottles, users have the ability to act on media that is suspect to prevent data loss.
Action Item: What to do Now It is clear that both disk and tape can play a very active role in today’s store-everything world. With the technologies mentioned above, both can offer tremendous scalability and data durability. The right choice will be driven by scale, user access requirements and projected data growth rates.
For small shops under 100 TB: take a serious look at tape systems that capitalize on LTFS, deliver NAS access to tape and include data integrity features. These solutions are so good today, some people may even forget they are using tape.
For larger environments, choices have never been better. Evaluate solutions that leverage erasure code object storage. These can deliver the most comprehensive solution for access to data and cost effectiveness. If budget is too restrictive, large NAS tape libraries continue to provide the convenience of NAS access to data with a small trade-off of higher data access latency in return for lower cost.
Footnotes: About Mark Pastor:
Mark Pastor is product marketing manager for archive products at Quantum. Mark represents Quantum within the Active Archive Alliance. He regularly blogs on topics relating to data protection and archive, and his writing has appeared in Information Management and Data Center Post.