Storage Peer Incite: Notes from Wikibon’s August 28, 2007 Research Meeting
Moderator: Peter Burris & Analyst: David Floyer
This week Wikibon presents Dialing in on CDP. Continuous data protection (CDP), and its close relative, data snapshot capture technology, are, as Peter Burris pointed out in our most recent Peer Incite Meeting, important initiatives to solve two problems that have plagued IT since its beginnings and which to date do not have full answers -- accidental deletion or alteration of data and data corruption caused by unexpected interactions between applications.
Normal backup/restore methods, ranging from traditional tape backup to the three-data center architecture, are too broad brush to diagnose and repair these specific and very important problems. CDP provides answers to these problems because it works at a more granular level -- individual applications, even individual instances on particular desktops. It therefore allows restoration of specific data for an application from versions that are minutes or hours old, rather than requiring cumbersome restoration just to recover a particular vital spreadsheet or e-mail that someone accidently deleted.
CDP for files is a clean solution however for block-based storage it is not perfect. Since it basically records writes to disk and most applications keep some data in buffers, in a restore, that data must be recovered from logs and could be inconsistent. CDP is also expensive. Because CDP logs every write, it more than doubles the storage needed for the data being backed up, and it has implications for network loads. Disk-based snapshot copy offers a less expensive alternative and has the advantage that buffers can be flushed before each snapshot if supported by the application. This however sacrifices data written after the last snapshot in a restore, yielding a longer recovery point objective (RPO) and recovery time objective (RTO), although these are still much shorter than the typical 12-24 hour RTO of recovery from tape. The frequent buffer flushes also will impact the performance of applications. Bert Latamore
Contents |
Dialing in on CDP
Continuous data protection (CDP) (and here we include its close cousin, data snapshot capture technology) is a new member of the family of technologies dedicated to improving the overall recoverability of data. However, it offers an interesting departure that is especially effective as we move in a more user-driven, virtualized application world.
Historically data recovery and protection technologies emphasized the need to recover from machine failures. Because data recovery technology evolved at a time when hardware failure was common, most data protection recovery technologies were designed to protect against a larger server, storage device or other machine-level problems. Since then they have been expanded to provide data center-wide recovery protection in the event of a regional disaster.
However, even as organizations were implementing significant backup, restore and recovery technologies, IT recognized that human error and unanticipated application interaction conflicts were also sources of significant data loss and corruption. In general technologies like replication, tape backup, etc., made no effort to address these more complex challenges for backing up and restoring data.
CDP is an important initial foray into the world of protecting organizations from information loss from these more localized but still sometimes devastating events. Effectively CDP creates a running log of writes at a storage level, tagging each disk event with a time stamp to improve the granularity of recovery point objective (RPO) goals down to milliseconds. Consequently when a user performs an error, such as an Exchange user inadvertently deleting a critical e-mail or a software developer inadvertently changing a critical development subsystem under test, it is possible to recover just that data at just the point prior to the execution of the write.
In today’s world, where applications are increasingly tied to the system’s capability to support complex human activity including mobile, collaborative and other types of complex interactions, CDP, which promises information recovery from human error, becomes a more important technology option in the backup/restore continuum.
Additionally we anticipate that the effort to increasingly virtualize physical computing resources will lead to circumstances in which application interactions are no longer predictable, forcing IT toward a CDP-type solution to protect an organization from the consequences of unanticipated application interactions.
The benefits of CDP can be high. However costs can also can be significant: For a modest-sized Exchange implementation it is easy to envision an initial $100,000 out-of-pocket expense for installation and $40,000 per year for maintenance. Having said that, CDP emerges as an important new member of the overall backup/restore continuum requiring immediate attention from IT organizations in which the effort to protect information has become paramount due to business needs.
Action item: CDP is a backup and restore technology. However it must not be considered solely in the context of traditional backup/restore technologies but should rather be regarded in terms of the need to protect the business from human error and complex application interactions. Under circumstances where users face burgeoning application complexity, CDP becomes a more viable backup/restore option.
CDP for applications that cannot sit still
As application development (e.g., open source LAMP stack), networking (e.g., WiFi), and end-user device (e.g., iPhone) technologies advance, information system support is introduced into more complex operations domains. In today’s business world, critical contracts are being significantly altered in response to complex queries executed from a telephone in a restaurant late at night and hundreds of miles from a secure terminal.
As a result, opportunities for significant information loss as a consequence of simple human error are increasing dramatically: fingers slip, styluses slide, and buttons are inadvertently pressed. Under traditional regimes for application management (e.g., TP applications), the damage of random human carelessness (to put it nicely) are kept to a minimum by separating the activities of application administration (e.g., pricing table maintenance) and application usage (e.g., change order count). However, new collaborative applications often blend (if not fuse) these two activities (e.g., setting up working groups to pursue a specific opportunity). Consequently, increasingly critical applications are being exposed to the vagaries of human comedy. Continuous data protection (CDP) technologies that can pinpoint recovery activities to specific information objects in “human” time, and that can minimize the effects of a single user’s mistake on a large user community, are emerging as an important tool for storage administrators that cannot depend on an application’s intrinsic recovery function.
Action Item: Continuous data protection (CDP) products should be instituted by IT organizations to protect increasingly complex and critical message-based, collaborative applications (e.g., mail, workflow) as the technologies warrant. Assumptions that an application’s unique approach to restoring function in the event of unique human errors are becoming increasingly difficult to justify.
Target CDP at life or death applications
Ten years ago, hardware failures were a big issue in data loss. Not any longer as technologies like mirroring and RAID have addressed this problem. The main culprits of data loss today are human error and/or unanticipated application interactions that lead to data corruption. The Wikibon community estimates that data loss from human error and software glitches can exceed 50% of incidents of data loss. This is the domain of continuous data protection (CDP).
CDP, whether file or block-based, should be targeted at those 'life or death' applications that are deemed mission critical. These are often applications that are:
- 7X24
- High write:read ratios
- Mission critical database
- Applications with stringent recovery requirements
- Applications enduring lots of change (e.g. early-in-life)
CDP however can be expensive, often requiring 2.5 - 5X the amount of storage for applications being continuously backed up, CDP software licenses (often upwards of $50K), increased network bandwidth and server overheads, integration/implementation costs (often in excess of $75K), testing and ongoing maintenance costs of $30K - $40K annually.
Action Item: CDP is not for everything but rather should be focused on those applications with recovery requirements that are the most demanding. Users should conduct a proper RTO and RPO assessment as a starting point and carefully assess the cost implications of CDP to ensure it's targeted at the right applications.
CDP for early-life application support
One of the most compelling features of “true” CDP systems is the ability to dial-back to a point-in-time before and after a system “event.” This significantly simplifies problem resolution for application developers, DB administrators and operational support staff. This is especially important during the early life of a new application.
Action item: Consider a slimmed down “true” CDP system for supporting problem resolution during the early stages of new systems. Build in procedures that will take out the CDP support when these systems becomes stable and reapply the CDP system to new the next generation of applications.
CDP not a general purpose B/R substitute
CDP is a relatively new class of technology that offers important returns in specific application domains. However, like any relatively new technology, CDP is likely to be oversold. Specialty CDP vendors may sell it as a general purpose backup/restore (B/R) solution; large vendors with poorly performing B/R products may attempt to reposition existing B/R tools as "CDP like," "CDP ready," or "CDP-lite." The Wikibon community recognizes the need for specific CDP capabilities in specific application environments but does not believe CDP technology is, or ever will be, a general purpose substitute for proven B/R solutions, from tape to subsystem replication. Many factors contribute to our thinking, including:
- Requirements to institute a degree of physical separation between primary and backup systems to accommodate disaster concerns.
- Significant cost differentials between CDP solutions and much cheaper, yet still high performing automated tape systems.
- The potential requirement to backup the CDP system.
- The rapid pace of invention across the B/R technology board.
Action Item: Focus CDP deployment on specific applications featuring significant RPO objectives. Do not try to unilaterally extend CDP into B/R domains being adequately handled by mature B/R technologies.
With CDP, recovery is one thing, backup is another
Many storage observers and some in the vendor community have suggested that continuous data protection (CDP) can replace existing backup regimes. This probably does more harm than good to the topic from a marketing standpoint. CDP is aimed at high value applications and should be treated as such.
CDP debates can rage about file versus block, true CDP versus near CDP, snapshots versus snapshots plus, in-band versus out-of-band, etc. But the most useful initiatives vendors can embark upon for CDP adoption are:
- Integrate CDP into existing backup regimes; and
- Integrate CDP with specific applications (application aware CDP).
Why doesn't CDP replace existing backup approaches? First, CDP is really a recovery mechanism that uses logs to roll forward and roll back points in time. But CDP is being targeted at big, important applications that need a proper backup where everything is consistent and is backed up 100%. Second, customers for these types of applications need to have a way to get a consistent point in time backup off the site (versus a remote backup of the log) and that is what a proper backup will accomplish.
Regarding application integration, there are many useful examples. Exchange customers would like elemental recovery by dialing back for a particular user. Oracle customers can benefit from the automated recovery of a critical database and SAP users would benefit from being able to recover across a set of servers and volumes that comprise an SAP system.
While much of this work is underway (e.g. Symantec integrates with its own backup facilities) lots more needs to be done.
Action Item: CDP vendors should focus more attention on integration with existing backup regimes and make CDP more application aware. This will add much more value to the customer base than less relevant technology debates that serve more marketing hype than business value realities.
Integrating CDP and traditional backup
The sweet spot for CDP is applications with aggressive RPO and RTO requirements. One of the claims sometimes put forward for CDP is that it obviates the need for traditional backup systems; vendors have argued that CDP logs provide the sole form of recovery.
CDP systems do not provide a clean copy of recovery data. Additional steps are required to ensure that data is consistent both within and between applications. This type of recovery carries inherent risks of system and procedural failures, and the very nature of the RPO and RTO requirements means that it is unlikely that a CDP system alone can meet them.
Action item: CDP should be the preferred method of recovery but is unlikely to provide the sole backup and recovery system. CDP technology will need to be integrated into the general backup and recovery systems for the data center. Ability to integrate easily should be a major technology selection criterion.