Most organizations have deployed disaster recovery technology (most notably, for backup and recovery of data). Many of these have a formal disaster recovery plan, outlining what needs to be done to prepare for and recover from various types of disasters. But fewer enterprises have thought out a comprehensive, risk-based disaster recovery plan and processes upon which recovery technology is acquired, disaster teams staffed, and recovery processes based. The result is that the recovery technologies and processes in place do not actually meet the business disaster recovery requirements, or (just as bad) exceed these requirements, at an excessive cost. Management then complains that they do not understand the business case for disaster recovery.
Planning a disaster recovery capability and processes helps to avoid that pitfall. The aim of this note is to show you how.
Enhanced disaster recovery process capability
Disaster recovery planning involves much more than backup planning, offsite storage, and locating a recovery site. Organizations must have written, risk-based, tested disaster recovery plans that address the critical operations and functions of the business and supporting technology.
The actual probability of a disaster occurring in an organization depends on the threat and vulnerability profile of that organization (determined during the planning process) as well as a healthy dose of uncertainty. A disaster plan acts as an insurance policy, providing assurance that such a disaster will not necessarily lead to financial, commercial, or reputational loss. Planning brings the following benefits:
- Minimizing potential disruption, exposure, and loss in the event of a disaster
- Reducing the probability of disaster occurrence through robust recovery technology.
- Minimizing insurance premiums
- Reducing single points of people or technology failure and enhancing safety
- Protecting the assets of the organization
- Improving documentation of processes and technology
- Minimizing decision-making during a disaster
- Minimizing legal liability
Specific operational goals of implementing an enhanced disaster recovery process capability
The goal of enhancing disaster recovery planning is to ensure that all crtical business units, processes, and supporting technologies are covered by a robust disaster recovery plan covering a number of relevant disaster scenarios. There are specific targets associated with implementing an enhanced disaster recovery process capability, including:
- Percentage of businesses covered by a disaster recovery plan. An example of metrics is:
- All businesses that contribute significantly to revenue (>10%)
- All business that provide significant components or services to other parts of the business (solutions may be second sourcing rather than disaster recovery)
- Overall should include 85%+ of business revenue
- Percentage of business units subject to an updated business impact analysis (BIA) and risk assessment. An example of metrics is:
- No business unit covered assessed more than 3 years ago
- Less than 30% of business units covered more than 2 years ago
- All key business units assessed within the last 2 years
- Number of key disaster scenarios covered in the plan. An example of metrics is:
- Should cover at least 80% of identified risks
- Should cover all key technologies in business units covered
- Percentage of plans tested. An example of metrics is:
- No disaster plan for business unit covered tested more than 2 years ago
- Less than 30% of business unit disaster plans test more than 1 years ago
- All key business unit disaster plans tested within the last year
- Percentage of users trained in the plan (by business or technology area) An example of metrics is:
- Plan reviewed by all top executives every year in all key business units (emphasis on overall plan, key implementing people, key resources, key contact information. Should fit in a pocket and/or be available on a PDA)
- All line of business top executives should be involved in testing and reviewing at least one DR test every year
- 90%+ of line of business managers should be in a formal review of the DR plans for their part of the business every year
- 70% of line of business managers should be involved in testing and reveiwing the DR plans for their part of the business
- Plan testing success versus recovery goals, per plan
- All recovery tests that do not meet the recovery goals should be reviewed by line of business management
- All recovery tests of significant business function that do not meet the goals should be reviewed by line of business executive management
- Overall backup, recovery and disaster recovery costs per user (normal range is 5-10% of IT cost/user)
Risks of implementing an enhanced disaster recovery capability
There are few risks involved in implementing an improved disaster recovery process.Doing so may uncover weaknesses in your current business operations, recovery architecture, or recovery resourcing and provisions that need to be addressed for the plan to be implemented. The risks of using resources that could be used to further other business priorities are usually ofset by reducing the expected costs of failure and reducing the costs of protection against unlikely risks.
The effective disaster recovery process solution
The business driver for improving disaster recovery planning and processes is to ensure that such planning realistically considers the risks and vulnerabilities that each particular organization faces, when trying to balance user recovery requirements with the cost of the recovery solution. The requirement for a well-planned disaster recovery process is not only internally-driven; external stakeholders like customers, business partners, regulators, and auditors are all expressing needs for business continuity. Implementing a disaster recovery process solution is done by needs analysis, plan design, and plan deployment.
Expectations (Out-of-scope)
Making technology arrangements to support recovery is an important part of disaster recovery but is out of scope of this note – which focuses instead on the planning and recovery processes. In addition it is assumed that the BIA process itself is established within the organization, and that there are trained resources and indentified management resources assigned.
Analyze phase
Building a disaster recovery project team is the first stage in analyzing disaster recovery processes. Such a team should be under the governance of a higher-level disaster recovery steering committee, or some existing committee that can provide meaningful oversight (IT Steering Committee, Risk Steering Committee, etc.). The team typically includes:
- A sponsor, who is charged with making key project decisions (funding, scope, timelines, quality of service) and with keeping the project on focus
- A disaster recovery manager, in charge of coordinating and project managing the effort (in large organizations a dedicated project manager may be hired to assist / support the disaster recovery manager), including managing project risks and ensuring that the project team all has undergoing basic disaster recovery training
- A project administrator, responsible for coordinating project meetings, tracking actions and deliverables, and minuting meetings
- Operational specialists, who work with backup and recovery operations on a daily basis and will help to architect and engineer technical solutions
- Security specialists, who will help build security architecture around the disaster recovery architecture
- Auditors, who will ensure that the disaster recovery process provides for sufficient internal control and meets any regulatory or other external control requirements
- Key business users, who will define business requirements for recovery then help to test recovery solutions versus these requirements
The team’s first objectives are to define a mission statement, conduct a business impact analysis (BIA) / risk assessment, define recovery objectives, and document recovery requirements and metrics. If a disaster recovery policy does not yet exist, one also needs to be defined. These are all crucial inputs into building a recovery strategy, a disaster recovery plan, a recovery organization, and a backup and recovery architecture.
At this stage, any automation tools to help in disaster recovery planning (BIA development, risk assessments, plan development, etc.) are acquired, and key staff receive training in these tools.
For a company that is similar to the Standard WikiBon business model, the people might be in the range of 5-15 man weeks of effort ($10-$30,000), with an elapsed time of 4-5 man weeks (excluding any specific BIA analysis)
Acceptance Test Considerations
The Analyze Phase is complete when the initial disaster recovery planning documentation has been completed. This phase can take anywhere from a few weeks in a smaller, less complex organization to several weeks or even months in a large and multifaceted operation.
Key analysis milestones
Milestones in the Analysis Phase typically include the following:
- A disaster recovery planning team is in place.
- The mission statement clearly defines the purpose, goals, and scope of the recovery effort, any constraints or limitations.
- The business impact analysis (BIA) is complete and signed off.
- Threats, vulnerabilities, impacts, and exposures are clear.
- Financial, technology, operational, and resource impacts are all considered.
- RPO and RTO numbers have been generated by the BIA process and / or by business users, and have been signed off by management.
- Recovery requirements are clear, and traceable to the original mission, the BIA, and metrics like RPO and RTO.
Design phase
The design phase involves taking the threats, vulnerabilities, impacts, and exposures identified in the risk assessment and BIA to formulate a risk-based disaster recovery plan upon which recovery technologies can be acquired and deployed, recovery teams staffed, and recovery processes determined. During the design phase:
- disaster definition has been agreed (note that there can be many types of disasters)
- mitigation controls and associated costs are determined
- mission critical services are prioritized
- recovery goals are clear (identical performance, availability, and other service levels to a normal business environment may not be realistic)
- recovery assumptions are agreed
- recovery team resourcing is clear, and the roles of teach recovery team have been defined
- disaster invocation and notification processes are described
- disaster plan maintenance procedures are clear
- processes for testing the plan, training recovery team members, and notifying all affected parties are clear
Understanding recovery assumptions is critical. Some organizations assume that they are planning only for the worst case scenario, which may involve a total loss of a processing facility, its technology, and many key staff. Others may also plan for partial disaster scenarios (the facility is fine but certain servers are unusable, the facility and technology are fine but employees cannot gain access to the facility, and so on.)
Also important is the choice of recovery locations. Alternate locations for housing staff, conducting business, and / or operating technology must be geared towards the risk basis of your plan, the defined disaster scenarios, and your plan assumptions. For technology recovery, a choice must be made (again, depending upon the same criteria) between hot, warm, and cold sites. And for each type of recovery location, the required recovery resources must be identified (telephone systems, computers, networking equipment, software, data, documents and forms, office supplies, other equipment, etc.).
This is one of the lengthier phases of the disaster recovery planning process, and can take at least a couple to several months. The cost is likely to be in the range of $30-80,000.
Acceptance test considerations
The Design Phase is complete when the BIA and risk assessment have been used to generate a disaster recovery plan, when supporting teams are in place, and when recovery resources and facilities have been procured.
Key design milestones
Milestones in the Design Phase include the following:
- Recovery goals and priorities are clear.
- Disaster recovery plan has been documented
- Recovery scenarios and supporting assumptions are clear.
- Mitigation has been put into place.
- Recovery team composition and goals are agreed.
- Other plan deliverables are clear and actioned.
Deploy phase
The deployment stage begins with providing training to all recovery teams. Each team has different objectives, and will therefore require different training. The management team, for example, will typically require training in overall plan structure, types of disasters, an overview of recovery strategies, disaster notification, disaster communications, overseeing the recovery process, and the process of moving from recovery to normal business operations. Technology recovery teams need more specific training into which capabilities need to be recovered, which recovery resources are available, how exactly backup and recovery systems work, etc. Facilities / logistics teams need to know how to move operations from the affected facilities to the recovery facilities. User recovery teams will need to know about disaster notification and calling trees, where to report in the event of a disaster, which resources are available to them (and which are not), and what they are expected to do and prioritize during the disaster. Finally, there are training issues common to all teams (such as evacuation procedures during a disaster.)
Next, disaster recovery documentation is created. Documentation will normally include the following elements (not necessarily in separate documents):
- The BIA and risk assessment
- Calling trees and notification details
- Plan overview
- Disaster scenario definition
- Disaster declaration policy and process
- Listing of recovery resources
- Description of composition and roles of recovery teams
- Listing of recovery sites
- Diagrams and descriptions of recovery technology
- Recovery procedures for each team
- Policy and procedures for moving back to normal business operations
- Disaster recovery training documents
- Disaster communications policies and procedures
- Security procedures and resources during the disaster
Finally, the disaster recovery plan (including supporting documentation) is tested in a re-enactment of the various disaster scenarios. This will require he cooperation of participating vendors, including those of any hot, warm, or cold recovery sites. At this point, the feasibility of the plan and adequacy / compatibility of the technical environment is assessed. Testing will also require management support and assistance from various user and technology recovery teams. Based upon test findings, adjustments may need to be made to recovery technology, resourcing, facilities, procedures, or other aspects of the plan.
The deployment and testing of a recovery solution typically takes at least several weeks. Typical estimates for the Standard WikiBon business model would range from $30-$80,000), depending on the problems indentified, excluding any investment in new technologies required
Acceptance Test Considerations
The Deploy Phase has been successfully implemented when recovery team members are trained, when documentation is complete, and when the entire plan (including documentation, participants, resources, facilities, and technology) is subject to testing under the relevant disaster scenarios.
Key deployment milestones
Milestones in the Deploy Phase include the following:
- Training requirements are determined.
- Appropriate training is given to affected project teams and users.
- The disaster recovery plan and supporting processes are documented.
- A set of disaster test plans is composed..
- The disaster recovery plan is tested and results are compared to plan goals and requirements.
- Testing issues or gaps in the plan are addressed through plan modification, resource enhancement, facility enhancement, or similar. Lessons leared from testing are documented along with test results.
- A plan maintenance program is put into place.
Initiative summary
Organizations small to large, simple to complex, all face the risk of disaster. The nature of these risks, vulnerabilities, and business impacts varies firm to firm, as does the particular design of the recovery architecture. But all enterprises should follow a common risk-based planning methodology to ensure that their recovery money is well spent and that objectives are met effectively. Starting with a risk analysis and business impact analysis, moving to requirements and metrics definition, proceeding to planning, and only then putting together recovery teams, resources, facilities, and technologies will assure that this can happen. Testing and maintaining the plan completes the cycle.