For years, certain industries, such as finance, insurance, telco, health care, transportation, and other supporting industries, have been required for regulatory and compliance reasons to demonstrate that they have the capability to fail-over and fail-back, often at asynchronous distances. Demonstrating proof for this requirement has become increasingly challenging for organizations, as failing over to the backup site cannot be tested thoroughly due to the exposure to production systems during the test process. Specifically, if something goes wrong in the testing process, the production systems could be damaged or lost permanently.
As such, what many companies do in this situation is test DR with a historical backup copy of the production system, not the actual production system itself. While this approach approximates a disaster scenario, many feel it is inadequate to simulate a true-life disaster.
The response for many organizations has been to install three-site data centers where two locations are within synchronous distance and a third data center is placed many hundreds of miles away, out of the “synchronous danger zone.” This approach provides near zero RPO synchronously while at the same time enabling a third data center to add another layer of protection in case of a calamitous disaster locally. It also enables more adequate testing of DR processes, however it is extremely expensive and only appropriate for the most valuable applications.
At the April 10, 2012 Wikibon Peer Incite Research Meeting, we heard from an industry practitioner that installing a locally synchronous “black box” from a company called Axxana, that is nearly indestructible has allowed his organization to avoid the exorbitant costs of a three-site data center approach while enabling proper DR testing.
The nearly indestructible Axxana system captures the RPO delta from the IT shop's continuous data protection (CDP) solution which takes snapshots in 15 minute intervals. The Axxana solution fills the DR gaps and provides the protection of a three-site setup at about one third of the cost.
The solution provides the best of both worlds and is attractive, especially for mid-sized companies, because it allows for three-site data center class protection using two data centers at asynchronous distances. Because the Axxana system is part of the normal IT operations, the approach can allow practitioners to eliminate risky DR testing and instead make DR testing a routine part of IT operations.
The catch today is that the Axxana system is narrowly focused on enabling EMC’s RecoverPoint-based arrays only. Over time, as the product is proven, we expect the system to target a much broader base of platforms. Nonetheless, the approach is intriguing and when compared with alternative zero data loss solutions, it appears to have great potential.
Action Item: Proper DR testing has become a risky proposition for many companies; especially those that cannot afford three-site data center protection. IT organizations should consider new approaches such as using nearly indestructible infrastructure locally (e.g. the Axxana black box idea) and leveraging asynchronous distance to cut costs, reduce testing risk and improve disaster tolerance.
Footnotes: