Disaster Recovery is a difficult subject to bring up with most CFOs. It's easy to lose the focus of a business person while trying to convince him to spend a large sum of money to help mitigate the effects of some type of disaster that may or may not happen sometime prior to the present investment becoming outdated. Simply put, most CFOs are more concerned about surviving the next quarter than about surviving an unpredictable and possibly unlikely disaster scenario. The solution: Preventing the glossed over gazes from C-level executives when describing a DR system is as easy as saying PRODUCTIVITY.
The new solution we proposed automated the entire backup procedure, eliminating the need for engineers at remote sties to perform backups manually, freeing their time for their primary responsibilities. Simultaneously, it was much more reliable than the old manual process. Prior to the project, only 40% of Strand’s backups succeeded. Subsequent to the project Strand experienced a 90% success rate.
Decreasing the over-all impact of a disaster is the key purpose of any DR project, thus it should be the primary investment justification. The most important metrics to present are the RPO and RTO, but they need to be presented in clear business terms:
- RPO - The amount of completed work that will be lost and need to be redone.
- RTO - The amount of time it will take before our employees can start working after a disaster.
- Total Impact - RPO + RTO + Time it takes to re-do the lost work.
Typically, the man-hours lost between the disaster and the last recovery (RPO) were productive work hours that would have made the company money in some way. Additionally, there won't be any productive work completed until the systems are restored (RTO), so those hours are lost as well. The productivity cost of a disaster can then be calculated:
- Productivity Loss = RPO + RTO
Then the cost of the loss can be calculated by adding the hourly rates of all the individuals affected by the disaster:
- Cost = Productivity Loss x Total Hourly Rate of Effected Employees
Once the impact of a disaster is established in general terms, it is important to put those terms into real dollars. First, calculate the RPO of the current DR system based on a worst-case scenario. In Strand's case, there was an example where a tape drive had broken and wasn't replaced for three days. RPO was calculated from the time the last job finished (1:00 a.m. Monday) to the time the next job finished (1:00 a.m. Thursday), which represented an RPO of 72 hours, 24 of which were working hours. The RTO is the amount of time to acquire, rebuild, and deploy critical servers plus the amount of time needed to restore the associated data, which Strand calculated to be 29 hours, 8 of which were working hours. Then the cost of a disaster can be estimated in real dollars:
- Cost = (24 work hours lost + 8 work hours lost while recovering data) x Total Hourly Rate of Effected Employees
Using real-world examples to show why DR is important should lay the groundwork for approval of a DR project. Unfortunately there isn't much incentive for the CIO asking, "What will this project do for me now?"
In those cases you need to find and sell an added benefit: Immediate productivity gains that can be realized by 'Recovery as a Service.' Strand's IT staff now uses hourly snapshots on Falconstor's CDP device to perform restores for users. This added benefit increases both the efficiency of end-users and the IT staff. If an end-user makes a mistake in a file, or wishes to obtain an older version of a file, or if a file is corrupted, losing the latest work on it, she simply e-mails the help desk, and the IT staff can perform the restore in a matter of minutes. Prior to installing the Falconstor CDP solution, the IT staff could only provide a nightly restore point. If the end-user determined the nightly backup would be useful, the process required a member of the IT staff to attempt to locate tape that would include the information needed, load the tape, wait for the job to complete and hope that the tape had the correct data. Now the user simply states the nearest hourly snapshot, then a member of the IT staff mounts the snapshot, and copies the requested files into live storage. The whole process takes 10 mouse clicks. Thus the answer to the question "What will this project do for me now?" is the system will:
- Increase end-user productivity by reducing the time it takes to recover files.
- Increase end-user productivity by offering hourly snapshots and reducing the time it takes to redo lost/corrupt/erroneous work.
- Increase remote office staff productivity by reducing the time spent administering backups in lieu of dedicated IT staff.
- Increase IT productivity by reducing the time to administer backup and recovery.
That's what this project will do for you now.
Action Item: Explaining the importance of RPO and RTO to the CFO is hard. The best way to sell a backup and recovery project to senior executives is to explain the current situation, identify the risks and explain the costs of mitigating that risk. In the case of Strand, the justification was a function of combining the benefits of increased productivity for recovery that occurred regularly with the mitigation of risk associated with disaster.