To address the question of what constitutes best practices in modern business continuity services we interviewed Marc Langer, the President of Recovery Point Systems, a leading provider of backup and disaster recovery services. This article summarizes that interview.
It is vital to start fresh when considering the requirements of the 21st century. Most data centers are old and not capable of being upgraded non-disruptively to support today’s power and cooling needs. Many centers were designed for 2kW of power per rack, but today’s equipment needs 50kW. Many users find they are contracting for empty racks just to get enough power to one rack. Moreover, almost none were designed for chilled-water equipment cooling. Thus a good DR site is a relatively new site designed for modern requirements including concurrent facilities maintenance or upgrades to permit changes in the site infrastructure without affecting operations.
The Uptime Institute has developed a Tiered Classification and Performance Standard for evaluating sites as follows:
- Tier I Basic Site Infrastructure
- Tier II Redundant Capacity Components Site Infrastructure
- Tier III Concurrently Maintainable Site Infrastructure
- Tier IV Fault Tolerant Site Infrastructure
Though many will claim they have a tier III site, in fact, there are very few such sites. Indeed, none have been certified by the Uptime Institute.
A fresh approach is also needed from the vendors to solve what Marc calls the triangle issue. Historically users have had:
- A provider of off-site storage for tapes,
- A hot site or hosting provider, and
- An expensive network challenge.
The elements needed for integrated, cohesive recovery sprang up in different industries, including records storage, computer leasing, hosting, and telecommunications. Today, when time is at a premium, many customers still end up contracting with multiple vendors at multiple locations and gambling their business on assembling a puzzle of services just when they are in a crisis and can least afford to do so. At best, it is an expensive, time-consuming, and unreliable process. At worst, it is dangerous.
Instead, a more holistic approach is needed. Start fresh with a modern scenario. In a perfect world everything, including a hot site, cold site, work area, off-site data storage and transportation, electronic data vaulting and hosting,all tied together with carrier-class network resources and all under one roof, is available in one package. One solution means one vendor, one execution, one call to make, and one throat to choke. Moreover, a massive, durable, diverse, and flexible network is required to support quick recovery. Today, users must be able to move large amounts of data to a recovery location persistently.
Of course, the optimum is twin active data centers with maybe a tertiary one as well. Only a few users can afford this; most can’t. What can be done is to get as close to this as is economically possible and an integrated provider is the best route for providing complete data protection, disaster recovery, and business operations continuity delivered from integrated and interconnected recovery sites. Even the big users with multiple sites/nodes scattered all over the world are starting to use hosted services in a DR site for at least one node. If they have a disaster elsewhere, they can begin moving more nodes into the DR site.
Interestingly, a fresh approach is not needed for testing a DR plan – it just needs to get done. And it is harder today. The bar keeps rising. We’ve gone from an RTO of days or a week, to hours or less. Many users feel a DR drill is just an exercise in documenting what they forgot. However, a customer that tests successfully can survive. A customer who does not test may not be able to survive.
Site design check list
In terms of sight design and location, here’s a checklist:
Design:
- Designed to meet the High Level Threat design criteria for blast protection as published by GSA’s Interagency Security Committee.
Electrical
- Dual underground power feeds via dispersed entrances.
- Up to 10 Megawatt reserve power generation with N+1 configuration.
- Seven-day, on-site fuel supply at full load.
- Dual-corded internal power distribution.
- 2N UPS support with 10-minute power supply at full load.
Network:
- Carrier-neutral services.
- Dark fiber to carrier hotels for cross connects to all major providers.
- Primary and secondary local facilities are linked by a bi-directional, self-healing dark fiber ring to create a secure, geographically dispersed campus.
- Multiple lit Tier One providers with diverse paths and building entries.
- Dual DMARCs hundreds of feet apart supporting redundant fiber entries.
Mechanical:
- Fully redundant chiller plant (2N).
- Dual-fuel boiler plant natural gas or diesel.
- Data centers provisioned with N+1 CRAC units.
- Self-sealing building designed with chemical/bio air purification systems.
- Drainage system above the 100-year flood plain.
Security and access control:
- Owner-occupied, fenced facilities.
- No signage or public access.
- All exterior building entrances protected by security doors which meet the UL 907 standards for ballistic protection and the ASTM Standard 1540 for forced entry protection.
- All windows must meet GSA ISC criteria for blast protection.
- Interior lobbies constructed as blast/ballistic containment areas per GSA/ISC requirements.
- Ballistic personnel isolation traps at key passages points and guard booths.
- Card key entry and biometric authentication.
- Remote mail delivery.
- Escort-only access inside hosting centers.
All sites should be:
- Unmarked and owner occupied.
- Equipped with anti-ram vehicle perimeter barriers.
- Provisioned with 100 percent UPS power for all critical infrastructure.
- Provisioned with 100 percent generator power for all critical infrastructure.
- Provisioned with a seven-day fuel supply.
- Provisioned with diverse fiber entries for communications vendors.
- Provisioned with high-speed transport to major Internet peering points.
No site should be:
- Near a symbolic terrorist target.
- Near a high-risk activity or in the flight path of a major airport.
- Within 25 miles of a major city.
- Served by the same electricity provider as another site.
- In an area with a high risk of natural disaster.
Action Item: Users with 21st century requirements need to look for 21st century data protection and DR facilities. These are few and far between. An integrated, holistic approach will result in a better recovery.
Footnotes: