The Time has Come for Backup as a Service…
[Note: This is a long version of a transcript I wrote for a recent video (embedded here). It contains additional detail that I didn’t have time to explore in the short segment.]
I’d like to address one of the most pressing problems that has plagued IT since the dawn of computing…backup.
What’s wrong with backup today?
I’ve been around a long time and since I can remember, backup and recovery has always been one of the most challenging issues facing practitioners. If you’ve spent any time in backup I’m sure you agree. There are six areas I want to explore as to why backup is so problematic – I’m sure you can add some of your own nuances.
1. In and of itself, backup delivers no tangible business value—it’s insurance.
2. Backup is expensive insurance. For every $1 spent on primary storage, 50 to 60 cents is spent on backup and recovery. (Note: I’ve previously cited a source from a major research firm that told me the ratio was 1:4 but I just can’t make the numbers work). A chunk of the spend goes to CAPEX but the real expense drain is OPEX including annual maintenance costs.
3. Backup is too complicated. I’ll drill into this further in this post but backup today is a hardened process and once it’s put in place, because it’s so complex and fragile no one wants to change it for fear of creating a nightmare scenario where an organization goes unprotected.
4. Data growth is outpacing backup advancements and unstructured data means more files, videos, wikis, SharePoint docs, email attachments, etc. All growing at different rates…with different data protection needs.
5. Despite this diversity, backup is often a one-size fits all capability where lower value applications get the same treatment as higher value apps creating either inefficiencies or in the reverse case, undue risk.
6. Businesses are demanding better service levels – not “one size fits all” – and they’re mandating that IT becomes more “cloud-like;” Today, organizations can provision servers in minutes and launch a new business initiative almost overnight. As the speed of business evolves, so must backup. It’s no longer acceptable to wait a day or even hours to get data back – users want their data instantly.
Why is Backup so Complicated?
Well there’s an answer for this. Backup was originally conceived as a batch task done at the end of the day. Back when production systems ran for 8 hours there was a substantial window in which to complete backups. As that window got smaller backup evolved to only backup incremental changes—but still at the end of the day and still basically one-size-fits all and increasingly complex. There are variations on this theme but generally this is how it works and it’s outdated and some would argue just not the right way to do backup.
What’s more, backup has a ton of moving parts. Consider an Oracle environment today. You have a database and an application using that database. There are interdependent applications as well as file shares that also need to be protected. There’s a backup utility within the database – say RMAN for example – and there’s a backup server running a backup application that orchestrates the backup and recovery flow for this environment; and of course there’s a target backup device and probably a replicated device off site. Now each of these elements has metadata locked inside, there are backup agents in the servers and the format of the backed up data is proprietary to the backup application. It’s essentially locked in.
This picture I’m painting is like a hardened piece of cement. It’s no wonder whey backup teams don’t want to change anything. If you touch anything it might break. And it’s essentially a one-size-fits-all approach meaning “it’s too expensive to tailor this so take it or leave it.” As such, every app in this environment probably gets backed up the same way unless you can afford or justify something like SRDF.
And at a larger company the backup team has probably built this capability for VDI backup and perhaps another one for Exchange and maybe another for Sharepoint. All customized with some re-usable content but generally not a simple picture.
How Must Backup Evolve?
New technology enablers such as virtualization and cloud computing combined with the notion of perpetual incremental backups using snapshots, are changing the way organizations think about backup and recovery. No longer is the idea of backing up a server the most viable approach. Rather we are moving to a model where a VM admin or an Oracle DBA or a Microsoft IT Pro can access a catalogue of services and construct a backup approach for a specific application.
This vision requires new thinking about data protection where backup is a service with associated components of that service (e.g. the server image, database, other related services, etc.) are protected as a whole, rather than as a bespoke set of resources.
Essentially, the most forward thinking IT organizations I speak with want to move backup from a task to a service. What does that mean? It means instead of bolting on backup as an afterthought to an application deployment, backup should be an intrinsic part of the business process. Application heads or business users should have an option to consume data protection as a service where the attributes of that service are “tailorable” to the needs of the business.
How Will Backup as a Service Become a Reality
At Wikibon we’ve identified four high level components to enable this vision to be realized:
- Consistent, space efficient snapshots. This concept was a huge technological breakthrough for backup as it allows incremental snaps to be taken periodically throughout the day. And introduces the notion that this set of snapshots applies to this application versus a one-size-fits-all backup approach. While many will associate snaps with expensive array-based solutions that’s not necessarily compulsory. Specialized appliances and software-based solutions can certainly play in this game.
- A new Open Orchestration Layer – Today agents are part of and controlled by the backup software. We envision separating the agents out as part of a new capability that has a set of open APIs to allow this layer to manage resources or enable management from other systems. Two key components of this layer are:
- The metadata or content catalogue – which is a very fast and efficient database (ideally open source NoSQL) that is consistent, transparent and accessible by outside resources; and
- A services catalogue that practitioners will access to build backup services. This capability leverages a policy engine that manages the services required to meet the specified SLAs.
- Protecting Data in Native Format. Today, backup data is stored in proprietary formats that, while delivering some efficiency and performance benefit, are mostly designed to lock customers in and get backup vendors paid more. Going forward there’s little question in our view for that backup data should be stored in native format and accessible for a variety of use cases – not just backup but maybe other systems like the data warehouse, archives or big data initiatives.
- Rich Dashboards that you’d expect from a services-oriented approach. Specifically a monitor that communicates the health of the backup system, how services are being consumed, are SLAs being met?, chargebacks or showbacks, Problem Notification and remediation progress, etc.
This services-like model is designed to take an application view versus a cemented environmental or server-by-server view. Here an application owner or DBA or VM admin defines the service level, the frequency of snaps the backup and recovery policy, the retention period, etc…all defined through the services catalogue of the orchestration layer. The cost of the service provided to the users is a function of the service level and the volume of data protected – transparently shared throughout the organization.
How Should Practitioners Plan to Implement a Services-Oriented Backup Approach?
First of all we recommend starting with the organizational and business issues and sorting those out before jumping into the technology. Specifically, IT organizations should set objectives and map backup into their overall IT transformation plans. Data protection should be a fundamental component of these plans, not an afterthought or a “bolt-on.” The goal should be to both simplify the process and design in the flexibility and transparency such that application owners or sys admins or whomever the organization charters with the backup process can consume backup as a service with fine-grained granularity of recovery point objectives and recovery time objectives, based on the value of the applications to the business.
[btw: I’m not a huge fan of speaking to the business in RPO and RTO terms – I think it’s too geeky and confusing. I’d much rather have a conversation about the value of the applications to the business, what happens to user productivity when applications or data are not accessible, how dependent the organization or department is to the app, etc. – And then map this into RTO and RPO terms].
Technically, we don’t believe that a services-oriented backup approach has to be a rip and replace exercise. Because it’s open, we see the orchestration layer as a component that can be slotted into the existing backup environment. Yes this is another point of control and one could argue it adds even more complexity in the near term but the idea is to take it one step at a time. Start with an application suite and begin to migrate the existing pieces into the orchestration layer. Pick areas of pain – for example, some application in a database environment that is “underprotected” and it would cost too much to apply a higher service level required by this app across the board. Learn from this and move on to the next area. Over time, your Oracle environment, Exchange, Data Warehouse, VDI, etc. will evolve to a service-oriented backup approach where the services are consumable, transparent and configurable at a granular level.
It just feels like a better approach with tangible business value versus today’s backup extortion model.
I hope someone can step up and build this before the end of the decade…