When it comes time for your business to develop contingency plans in the event of a disaster, there are a number of items that must be considered. Without performing reasonable due diligence, organizations risk implementing disaster recovery plans that:
- Don’t go far enough to meet business continuity needs
- Go too far based on the real cost of downtime
- Can’t be supported with existing personnel
In this article, I will present five items that should be considered in these endeavors.
How much does downtime cost the company?
Implementing disaster recovery is not a technology initiative. It’s very much a business initiative that comes down to dollars and cents. At the beginning of the process, perform a business impact analysis to determine how much downtime ultimately costs the company. As a side note, you might run across a no-brainer solution that just makes sense and allows you to skip the formalities, but that won’t always happen.
As a part of this process, make sure to consider items beyond outages. What would happen, for example, if the company were to lose a week’s worth of data? What would that cost in terms of dollars as well as in terms of customer confidence?
The business impact analysis will help you in planning the remainder of your disaster recovery efforts.
How much risk is the company willing to assume?
Based on the results of the business impact analysis, the company can start making decisions around how much risk the company is willing to tolerate from data loss and operations perspectives. Outages and data loss cost money.
Here, armed with information about how much outages and data loss cost the company, management can start to determine what should be implemented from a disaster recovery perspective. This is also the point at which reality will set in. For example, companies will start to figure out that implementing a zero data loss, zero downtime environment can be pretty expensive.
However, there will be a point at which the cost for a disaster recovery solution intersects with the costs identified previously. This is the sweet spot.
Here, determine recovery time objectives (RTO) and recovery point objectives (RPO). RTO defines the organization’s tolerance for downtime while RPO defines the organization’s tolerance for data loss. A disaster recovery solution will consider both of these metrics during solution development.
Just as is the case with other kinds of insurance, the organization can decide to implement more or less disaster recovery capability. If the costs for the desired RTO and RPO values are too high, the RTO and RPO values can be improved or scaled back to meet reasonable business needs.
Is the desired risk avoidance outcome achievable with current resourcing levels?
Buying high levels of insurance may be exactly what the company wants. There may be a desire to implement RTOs and RPOs that are as close to zero loss as possible and a desire to failback to a primary system as quickly as possible. The company might even be willing to spend the money to implement technical systems that can achieve these goals.
However, there is another critical resource that can’t be overlooked in this equation: People. When reviewing the process necessary for the agreed upon disaster recovery plan, are there sufficient and sufficiently skilled people available to meet what might be stringent requirements. Nothing is worse that experiencing an incident and having a gold-plated recovery plan only to find that the Achilles’ Heel is a robust workforce. Especially as organizations have gone on cost-cutting crusades, ensure that the human element in the disaster recovery plan is fully understood and appreciated.
How will the organization routinely test the agreed upon plan?
With a great plan comes a need to make sure that the plan is well understood and can be executed when the time comes. Therefore, it’s important to include some level of testing whenever possible. This could be monthly, annually or somewhere in between. Testing the plan requires significant staff time, so scheduling these too often could have adverse effects on your production work.
DR testing will identify problems with the plan, including:
- People that may not quite understand their roles
- The suitability of the communications channels used in the environment
- Whether or not RTO and RPO objectives can actually be met with the plan as-is
- The suitability of the recovery site(s)
Test the whole plan, not just the easy parts.
How do your partners protect your organization’s data?
Especially as the cloud plays a more prominent role in today’s IT activities, organizations are partnering with more and more vendors to provide essential services that might otherwise be too expensive to operate or that improve operational efficiencies. Significant outages by these partners can have as significant an impact on the organization as an internal issue.
As intertwined as internal services have become over the years, don’t forget just how intertwined these external services are, too. As a part of your planning effort, work with your strategic partners to ensure that they have disaster recovery plans that meet your business requirements and that your own plan takes into consideration the impact that failover and failback may have on your partners’ services.
Action Item: Disaster recovery plans are an insurance policy that organizations hope they never have to use. However, when the time comes, ensuring that plans are workable and there are no unexpected partner issues is of prime importance. Make sure that your plan meets the needs identified in your business impact analysis, that your staff can handle the work and that your partners don’t become an anchor that dooms your efforts.