The biggest problem in testing disaster recovery is the risk that something will go wrong and create a disaster. If actual production is cutover to the backup system, transaction processing started and something then goes wrong, it can take days to failback to the original system with traditional approaches.
To avoid this risk during traditional disaster recovery testing, actual production remains on the original systems, and a historic copy of data is recovered on the backup system. As business processes increasingly rely on IT systems, and IT systems become more integrated, this type of test is inadequate. Worse, it gives a false sense of security. As one CIO said, 'It is like relying on a function test of one module as a proxy for a full integration test of a system.'
The only way to properly test disaster recovery is to failover production systems to the recovery processing node and be able to failback to the original node. This requires adequate bandwidth, equipment, tested procedures and trained operators. It mandates that failover/failback is used frequently as a normal operational activity.
There is clearly additional cost and opportunity cost in establishing a different backup process, and these have to be evaluated against the reduction of expected and maximum loss as a result of a disaster. There may be additional strategic benefits for organizations; 24x7 systems will be easier to implement, systems can be moved round from node to node with greater ease, and the quality of operational staff can be improved. To reduce costs, particular attention needs to be placed on the distance between sites with the processing nodes, as increasing distance brings with it a non-linear increase in telecommunication costs. It is also more cost effective if sites can act as mutual backup for other sites
Action Item: CIOs should plan site, technology and software strategies based on the assumption that failover/failback strategies will need to be implemented by organizations for business critical systems (at the very least) within a three year period. Traditional disaster recovery testing is often a 'go through the motions / tick in the box,' to placate regulatory authorities. CEOs and risk managers should not accept this as sufficient proof that adequate disaster recovery processes are in place.
Footnotes: Disaster recovery strategy for storage
Implementing 3 node disaster recovery
Add to this Analysis Stub: Implementing fail proof backup and recovery;