Tip: Hit Ctrl +/- to increase/decrease text size)
Storage Peer Incite: Notes from Wikibon’s April 27, 2010 Research Meeting
IT infrastructure virtualization in general, and VMware in particular, has definitely become the "next big thing" in data centers worldwide. And no wonder -- virtualization offers compelling savings both in CapEx (by increasing server utilization from approximately 15% to 85%) and OpEx (by cutting the number of boxes on the floor and therefore power and cooling costs), while extending the life of data centers by cutting the demand on floor space and other limited resources.
But everything has tradeoffs, and the big problem with virtualization is data backup and recovery. Pre-virtualized backup infrastructure and methods do not work with virtualized environments, and VMware Consolidated Backup (VMB), the initial solution to this problem from VMware, was inefficient and often impractical.
The good news is that VMware now has a new solution, VMware Apps for Data Protection (VADP) and Change Block Tracking. Rather than being stand-alone systems, these are designed to be integrated into existing backup systems. Together they allow efficient backup of data from virtualized environments with minimal load on IT infrastructure. Once an initial backup is complete, the system can track changes by block, allowing subsequent backups to just involve changed data. Restores are also much more efficient because they can start with the most recent changed data and work backwards through backup updates to restore only the data that is needed.
So far six vendors have added VADP to their products. Users of those products should upgrade to that version. Customers of other vendors should ask for their schedule for VADP implementation and, if VADP is not on their upgrade list, urge them to add it. Implementing VADP backup will require changes in the backup architecture, but those are much better than not having a backup. G. Berton Latamore
As organizations aggressively virtualize servers, backup becomes increasingly problematic. The benefits of higher server utilization and increased efficiency have a challenging side-effect, namely reducing the amount of physical resources available for CPU-intensive applications like backup.
The result is that backup windows are tightening in many shops, and users are forced to sacrifice recovery point objectives (RPO) or deploy more physical resources to accommodate backup requirements, thereby lowering the ROI impact of VMware. VMware Consolidated Backup (VCB) has been VMware’s answer – a central backup facility which is a backup proxy server, typically requiring additional physical resources and more storage space to avoid performing backups directly through the ESX host. VCB is deployed in less than 10% of VMware shops according to Wikibon estimates.
There is light at the end of the tunnel for practitioners, however. At the April 27th 2010 Wikibon Peer Incite Research Meeting, the community welcomed Henry Robinson of VMware, a Director of Product Management with responsibility for the vStorage API for Data Protection VADP. Also on the call was Peter Imming, a product manager for Backup Exec at Symantec. The community discussed VADP, how it improves the backup situation for VMware customers and how the VMware backup ecosystem is integrating using VADP.
Here are three slides submitted by Henry Robinson that we discussed on the call. Robinson took the Wikibon community through the following.
Figures 1 and 2: Evolution of VMware Backup
- Physical world uses shared backup server,
- Plenty of resources available for backup due to poor server utilization,
- Moving that structure to a virtual world causes immediate resource constraints,
- VMware introduces VCB - uses proxy server (often physical) to reduce VM load but not streamlined,
- VADP integrates backup through vStorage API's,
- Data transfers streamlined as backup apps read direct from VM backup removing a full transfer,
- Changed Block Tracking - CBT - keeps track of changes that have been made and allow for only changes to be backed up (once a full backup has taken place).
"Backup is one thing, recovery is everything" (Fred Moore). Restore is still evolving and two models are emerging:
- Restore of full image because a VM has been lost, and,
- Restore of a single file.
Bottom line. Son of VCB (VADP) is better but more work needs to be done to evolve recovery.
Figure 3: Ecosystem Partners Supporting VADP (as of 4/2010)
In addition, we discussed on the call data deduplication and how backup vendors are adding value in this regard. The dedupe debate has focused on the degree to which a customer is I/O bound and the degree to which source-side deduplication can eliminate backup bottlenecks. To the extent that source-side dedupe can be shown to eliminate I/O bottlenecks and increase ROI, the benefits are clear, however customers must rip and replace existing backup software.
Increasingly data deduplication is being offered at the source directly embedded into traditional backup software; although it's somewhat less mature than source-side products that have been on the market for years (e.g. Avamar). Nonetheless, the trends toward deduplication embedded within existing backup apps and the introduction of CBT ease the pressure on users to choose between rip and replace and lower ROI. Specifically, a combination of, for example, Symantec Backup Exec and Data Domain, can deliver much better ROI with VADP than it could with VCB because the I/O load is reduced.
The introduction of VADP allows practitioners to use existing backup infrastructure, consoles, and processes with the new capabilities found in the API such as CBT. The re-design of existing processes is fairly straightforward but is optimized for a vCenter view displaying vCenter console folders, avoiding the need to browse multiple machines.
There may still be benefits of re-architecting backup and moving to a new backup software model (i.e. rip and replace) to provide more granularity, even less I/O activity and perhaps some advantages on recovery, but VADP extends the value proposition for the existing backup software suppliers supporting the API.
The key is a backup application that exploits VADP and ideally CBT. Vendors can receive VADP certification and, as shown in Figure 3, currently six vendors have been VADP certified and more are expected later this year. With VADP and CBT, users can restore from latest incremental and work backwards in time from differentials to restore only necessary blocks.
Action item: VMware backup has been a challenge for users with VMware consolidated backup providing a less than adequate solution. VADP addresses many of VCB's shortcomings especially with Changed Block Tracking (CBT). Users should push vendors for visibility on plans for VADP integration and plans to exploit emerging capabilities to reduce I/O loads and streamline backup.
Backup is governed by three vectors:
- RPO – Recover Point Objective, the amount of data that is lost when a backup is invoked (worst case a disaster and best case recovery from a log file),
- RTO – Recover Time Objective, the time it takes to restore the service and restore the application to full functionality,
- Backup Cost – the total cost of backup, including staffing, software and hardware.
In moving to a virtualized world, all these vectors can change. A virtualized server environment may be more efficient, but the lack of resources for backup and (usually more importantly) recovery will impact RPO and RTO. Data de-duplication may reduce cost and RTO but impact RPO.
With its first attempt at providing a backup and recovery infrastructure, VMware introduced VMware Consolidated Backup (VCB) and struck out by increasing all three vectors. For most virtual installations, backup and recovery were serious constraints on the deployment of virtualization, and a rip-and replace of the backup software infrastructure was a costly and risky alternative.
With the introduction of VMware APIs for Data Protection (VADP) and Change Block Tracking (CBT) VMware has scored a home run. The software backup industry has now can retrofit retrofit effective support for VMware environments to its traditional backup products. The current list of backup products that support VADP are:
- ArcServe from CA,
- Avamar from EMC,
- NetBackup & Backup Exec from Symantec,
- Simpana from Commvault,
- Tivoli Storage Manager from IBM (image backup still requires VCB),
- Veeam Backup from Veeam,
- vRanger from Visioncore.
VMware says that other vendors have product updates incorporating VADP in the pipeline for 2010 delivery.
However, significant redesign of backup and recovery procedures and backup software options together with testing will still be required to implement an effective solution that will meet current RPO and RTO objectives within the same cost envelope.
Action item: CIOs and CTOs managing the migration to VMware should not rip and replace their existing backup software strategy unless their vendor has no plans to move to VADP and CBT. As and when there is good support for VADP and CBT, organizations should invest the resources to redesign their existing backup software design infrastructure and optimize it for VMware to ensure that the vectors or RPO, RTO, and cost are properly balanced.
Since VMware came on to the IT scene and ushered in the virtualization wave sweeping the IT industry, there have been always been discussions around how to best backup the larger and larger numbers of virtualized systems. Do you just put a regular backup “Agent” in each Guest virtual machine and back it up like a physical system? Do you use some scripting method to shut down your virtual machines before backup with an Agent on the ESX host? How about a hardware snapshot solution? What about VMware’s own backup technology, VMware Consolidated Backup? What about deduplication for VMware data?
It has now become clear that most of these questions now have a single answer. VMware vStorage API for Data Protection or “VADP”. Having been in the backup industry for over 15 years, I can say safely that there has been very few and far between instances in backup where a technology delivers “all the right stuff” of better performance, lower CPU utilization, lower storage costs, and is easier to implement and use. However, the folks at VMware have accomplished just that. The vStorage API’s for Data Protection is a foundation provided by VMware for 3rd party backup vendors such as Symantec to use to incorporate into their backup applications to protect VMware virtual machine data. It is not a backup application by itself, but a set of software libraries that we (the backup vendors) build into our product.
Why is VADP the answer to so many of the questions I mentioned above in this post? In short, it simply works! It is fast, efficient, and very easy to setup.
There is nothing extra for a customer to install or configure. No VCB “framework” or “VCB Proxy Server” or temporary storage to configure or install. Just simply install the backup software that has the vStorage API for Data Protection integration support.
If you are using VMware’s older VCB technology, you should see an immediate 7-10 X improvement in your backup performance over a modern SAN (4-8GB). The best news is that you do not have to be on vSphere 4.0 to take advantage of this capability. It is fully backward compatible to ESX 3.5 Update 2. The story only gets better from there if you are a vSphere 4.0 environment. vSphere 4.0 introduces a new feature on top of the VADP. Change Block Tracking. No longer are backups of virtual machines relegated to just “Full” backups. Now high-speed block-level Incremental and Differential backups are possible as well.
VMware laid the foundation with vStorage API’s for Data Protection, now it is up to the backup vendors to build on top of it. At Symantec, we think we have done a pretty good job at that. With the release of Backup Exec 2010 and NetBackup 7, Symantec has built additional features on top of the VADP and was one of the first vendors to meet VMware’s thorough “VMware Ready for Data Protection” certification requirements. Backup Exec 2010 adds a number of enhancements to protect your virtual machines and reduce your overall storage costs, including data deduplication and single-pass backup with granular recovery of virtualized applications such as Microsoft Exchange, SQL, and Active Directory.
Action item: Whatever backup software you use today for physical or virtual system backup, make sure it is fully using vStorage API’s for Data Protection-certified methods for both backup AND recovery of your VMware 3.5 and 4.0 environment. If it is not, push your backup vendor to use the VADP and become certified. Look for backup solutions that do not require a “rip and replace” or require a separate backup product just to protect your virtual machines. Look for a backup application that you trust do handle both physical and virtual machines from a single console. Look for a backup product that is going to help your investment in virtualization really pay off by reducing your backup and storage costs through deduplication and single-pass backups of your virtualized machines and applications. If your backup vendor is unable to help you with these questions, take around for vendors who will.
Footnotes: Symantec Backup Exec
For effective implementation of virtual-server solutions such as VMware, application, database, server, networking, storage, and backup administrators all must be retrained. With multiple applications sharing physical server, network,and storage resources, and with applications moving dynamically from one physical server to another, well-established processes and methods based upon a one-application-per-server model must be re-examined.
There has always been a need to have well-defined roles and responsibilities among the IT operations team, and virtualization doesn't change that. In a virtualized environment, the VMware administrator is charged with ensuring that physical servers are optimally utilized, that new virtual servers are quickly provisioned, and that those applications that require higher availability are placed onto servers that provide automated virtual-server failover. Meanwhile, the application administrator continues to be charged with ensuring application performance and scalability. And finally, the backup administrator is charged with ensuring that recovery point objectives (RPOs) and recovery time objectives (RTOs) for applications, as defined in the business rules, are met.
Virtualization places new pressures on all administrator roles. The backup administrator, used to deploying backup agents on physical servers, may be unaware of agentless backup methods that are less disruptive and offer higher performance. Administrators, used to backing up physical servers, may not know how to perform backups for virtual servers or how to schedule a backup in such a way as to avoid impacting the performance of other applications residing on the same physical server. In today's virtualized environment, the physical resources are shared and the applications are dynamic. The VMware administrator may have no awareness of how the movement of existing applications from one server to another or the deployment of new applications on an existing virtualized server affects the ability of the backup administrator to complete timely backups. The application administrator, used to having tight control over the timing of physical adds, moves, and changes to ensure that service levels for application availability and performance are met, now must deal in a world where other applications or processes, such as backups, can unexpectedly impact application performance.
Action item: Companies that want to make effective use of virtualization with production applications should follow a three-step action plan to avoid unintended negative consequences:
- Re-examine all tools and processes to determine if they will continue to work in a dynamic, shared, virtual-server environment;
- Evaluate new backup and data-protection methods, along with application and infrastructure management and monitoring tools that are specifically designed for virtual-server environments;
- Establish a multi-disciplinary team that includes the virtual-server administrator, application administrator, and backup administrator, and task them with ensuring that production applications continue to meet the business unit's requirements for application performance, availability, and recoverability, from the standpoints of both recovery time objectives (RTO) and recovery point objectives (RPO).
Many backup providers have already made the successful transition to the VADP (vStorage API for Data Protection) architecture, the new data backup system for virtual machines from VMware. Already, solutions from CA, CommVault, EMC, Symantec, Veeam, and Vizioncore have delivered fully integrated-VADP solutions. We encourage IBM Tivoli to complete its transition to a full VADP solution – Tivoli Storage Manager still uses VCB for image backups. We challenge the rest of the backup community to complete, certify, and release their integrations with VADP. VADP-integrated backup solutions fulfill the user requirements to maintain and improve backup and recovery objectives in a virtualized environment which the VCB technology pioneered.
But the work is not done. While this successful integration of the backup ecosystem with virtualization has yielded technology improvements such as VADP, we can do more. Customer efficiency can further improve with new initiatives such as improved snapshot technology, integrated de-dupe, cloud enablement, and self-service restore.
Action item: Complete your integration of solutions such as VADP into your backup and protection products. Make these solution widely available to your customers so that they can be easily integrated into currently deployed environments.
By all accounts, the architecture of VMware’s Consolidated Backup (VCB) has not kept pace with user requirements to maintain and improve backup and recovery objectives, capabilities, and costs as data centers become virtualized.
Wikibon estimates that less than 10% of VMware users implement VCB to back up their virtual machines.
Performance and server/storage costs imposed by the VCB architecture are the primary reasons. For example, VCB requires that the ESX server make a copy of the data associated with each VM in a sequential manner. In the case where 50 VMs share a physical host, this means that the ESX server will copy the files associated with VM1 to a separate storage area, then the files for VM2, then sequentially continue the process to VM50. Those snapshots are then mounted to a proxy server, on which the actual backup software runs. The penalty is both in terms of time (the sequential operation of creating the snapshots), and costs associated number of proxy servers required to backup a large ESX environment. For many users, the performance and storage requirements architected into VCB works against the grain as IT managers look to simplify their computing platforms and deliver increased value to business and application owners.
The good news is VADP (vStorage API for Data Protection) replaces VCB as a native API (no additional software required). VADP APIs integrate directly with certified backup software providers (e.g., CA ArcServ, Symantec NetBackup and Backup Exec, EMC Avamar, Visioncore, VRanger, IBM Storage Manager) and, among other things, enable you to remove a workload from ESX server by consolidating backup load and management onto a central backup server environment. Change Block Tracking (CBT) is also in the "good news" category for VM backup and replication efficiency. CBT enables the ESX server to determine directly what has changed between the source and target and then to copying only those changed blocks.
Action item: Sunset your VCB infrastructure and begin the migration to VADP and CBT. Reallocate storage tied to VCB file and image snapshots back to the storage pool and reduce backup proxy server requirements to a single VADP API enabled backup VM within an ESX (not within the individual VMs hosting guest OS images and applications). Add VADP support and certification to your service level agreement with your back-up software vendor.