Tip: Hit Ctrl +/- to increase/decrease text size)
Storage Peer Incite: Notes from Wikibon’s March 3, 2010 Research Meeting
IT architectures, infrastructures, and the organizations that support them are being rocked by two huge trends that, between them, promise to alter the IT industry radically. Cloud computing is replacing in-house infrastructure with SaaS, IaaS, and similar services provided by third parties. The economic advantages of cloud computing raise the question: "Will we have any in-house IT department at all." Simultaneously, virtualization is revolutionizing the IT infrastructure for those applications that do remain in house.
With all this change, not the least of which is the huge reductions in IT staff that these twin technology trends promise -- or threaten -- one vital issue can get lost. These huge changes totally alter the enterprise DR plan. Each introduces new risks while, simultaneously, mitigating or eliminating others. And while the net effect may be to decrease overall risks, the DR plan must be completely redone down to the individual procedure level to adjust to these changes. For instance, moving a business application to the cloud can, depending on the vendor's capabilities, greatly reduce the risk of downtime in a disaster by making recovery as simple as switching to another site. Simultaneously, however, it turns the last-mile Internet connection into a single-point-of-failure, with the implication that a backhoe operator on a construction site down the street can shut the business down. G. Berton Latamore
Business Continuity 2010: How Cloud Computing and Virtualization Change Business Continuity and Disaster Recovery - A collaboration between the Disaster Recovery Journal (DRJ) and the Wikibon Project.
The March 2, 2010 Peer Incite meeting brought together the DRJ and Wikibon communities and featured two practitioners who are active as DRJ advisors.
- Randall Till, MBCP, Executive Council Member on DRJ's advisory board; currently vice president, global business continuity management for MasterCard International; who has been implementing BC programs within several organizations during his 18-year career.
- Michele Turner, MBCP, FBCI, CISA, ITIL, Editorial Advisory Board DRJ Editorial Advisory Board Member and Sr. Mgr of IT Risk Management and IT Governance at Microsoft Corporation. Mrs. Turner has managing editor 16 years' experience in Business Continuity and Risk Management efforts.
The purpose of the call was to explore how virtualization and cloud computing are impacting disaster recovery (DR) and business continuity (BC).
The key premise put forth to the DRJ and Wikibon communities was the following:
Virtualization is driving efficiencies and increased utilization. While delivering substantial savings to organizations, ironically, from a recovery perspective, virtualization consumes spare resources that often can be applied to business continuance; and hence organizations that aggressively pursue virtualization risk constricting agility from a BC standpoint. Cloud computing provides an opportunity to improve business flexibility and remove constraints by delivering elastic capacity for DR and business continuance.
Further, while bringing potential advantages, cloud computing carries risks that need to be understood and managed, including security, compliance, privacy and other operational risks related to business alignment. Examples include the ability to provide adequate recovery speeds due to the potential increased latencies of cloud computing and transparency of operations related to gaining visibility on key metrics (e.g. backup failures, system performance, RPO, RTO etc).
Several key points emerged from the call, including:
- As organizations increasingly pursue virtualization and cloud computing, demand for traditional business continuity expertise is on the rise. This is directly a function of the fact that BC and risk-management practitioners have visibility across an organization’s entire business technology portfolio and can provide a comprehensive view that is invaluable to cloud initiatives.
- Organizations aggressively pursuing virtualization and cloud computing need to exploit this expertise and cohere cloud initiatives with key BC metrics and disciplines, including risk management and, importantly, governance.
- It was the opinion of the DRJ and Wikibon communities that CIO’s need to set the overall strategy for virtualization and cloud computing, and from a DR and BC standpoint set the goal to build resiliency in as a fundamental part of business operations, as opposed to a “bolt-on” afterthought. A key component of this responsibility is the creation of an awareness of both opportunities and risks (see The CIO's Risk Management Role in the Adoption of Virtualization and Cloud Services).
- A key finding from the call was that organizations that want to drive virtualization and cloud computing deeper into their operations should start with governance to put in place a process to assess risks and identify/track metrics that are important to the organization.
- Scorecarding or other rating and ranking mechanisms were cited as preferred techniques to help identify high-visibility opportunities and risks and drive alignment. Keeping such approaches simple- and easy-to-understand is more important than developing sophisticated quantification methodologies that won’t be widely accepted.
- Facilities are an often-overlooked aspect of cloud computing, but as organizations increasingly outsource activities they should be mindful to choose suppliers that follow Best Practices in 21st Century Business Continuity Services.
- Vendors that desire to sell DR and BC services to organizations aggressively pursuing virtualization and cloud computing should understand that a one-size-fits-all offering is not advisable. Variances in size of company, industry, and key priorities can be best addressed (either directly or with partners) with robust assessment and implementation services that can help align BC initiatives to organizational objectives.
On balance, the consensus of the communities is that while there is plenty of uncertainty with regard to how to best leverage cloud computing, the potential for improved DR and BC is tremendous and organizations should begin to plan now.
Action item: Virtualization and cloud technologies, in the context of disaster recovery and business continuity, are outpacing organizations’ ability to absorb the new model of computing. Nonetheless, opportunities to add business value are substantial. Specifically, organizations aggressively pursuing virtualization should task governance and risk management functions to develop plans and protocols designed to leverage cloud computing and enable DR and BC to become a fundamental component of business operations versus a one-off application-by-application afterthought.
Virtualization and cloud computing are rapidly becoming critical tools in the arsenal of CIOs. Virtualization enables more efficient use of existing IT resources. Even more important for the business units that a CIO supports, virtualization enables IT to respond rapidly to requirements for new server and application deployments. Cloud-based offerings are also critical, as they enable rapid sourcing of additional IT resources and support business unit migration toward a variable cost model, in which cost increases and decreases track to changes in demand, thus providing a more predictable cost structure.
Beyond the responsibility to being responsive to business units, in many organizations CIOs also have an oversight role, assessing and advising business units on the risk levels of their technology strategies and implementing solutions that manage risk in accordance with the risk profile adopted by the corporation.
Virtualization and cloud computing services, which enable IT to be more responsive to business unit demands, bring with them the potential for additional risk. Not everything that can be done with new technologies should be done. The same technologies that enable a business to be agile may also undermine a previously well-designed disaster recovery and business continuity plan. The CIO’s responsibility for responsiveness must be balanced with the responsibility for risk management. Ultimately, the assumption of risk, which is inherent in all business, is a business management decision, but, as Tony Scott, CIO at Microsoft, stated in an interview published in the Disaster Recovery Journal, the CIO must:
- Create awareness of potential areas of risk,
- Assess those risks and help the organization think about and quantify risks (and) what they represent,
- Have plans to address and mitigate (risk) in the most effective way,
- Have continuous feedback on how (the company is) doing against plans and gauging the effectiveness of those plans.
Ultimately, virtualization and cloud-based services will become a key part of a company’s ability to move from a disaster-recovery mindset to a continuous-operations basis. As Tony Scott stated, “Fear as a motivator is not good in this area,” but the ability to be up and running 24-by-7 may be." Until a continuous business operations infrastructure is achieved, high-visibility scorecards of disaster recovery capabilities and assumed operational risk are critical to driving business-unit awareness.
Action item: CIOs should embrace virtualization and cloud computing to drive down operational costs, enable an infrastructure that is more responsive to business needs, and shift the corporation to a more variable-cost model. This should not be done, however, without first developing a plan for assessing operational risk and the impact on disaster recovery and business continuity.
Virtualization and cloud computing technologies hold a great promise to reduce costs and improve utilization. As a by-product, new services are being added to improve recovery capabilities and operational resiliency. Any new technology requires a catch-up period to adopt new processes, build controls, and establish operational governance. A key to success will be developing the processes to address system management, change control, and security requirements. As applications are migrated to a virtual cloud computing environment, these new processes must be available to effectively monitor and manage the environment. Business applications will also need to be enhanced to take full advantage of the new technology. Organizations will need to track applications, measure peak performance and build resilient services. This will also include the ability to validate the recovery plans and capabilities while addressing capacity and latency concerns. The continued adoption of ITIL standards will help guide this transition.
Virtualization and cloud technology has the potential to significantly change the way we address Disaster Recovery (DR). The environment will provide greater flexibility to validate recovery capabilities moving us closer to operational resiliency while at the same time reducing the complexity and size of DR plans. The ability to merge applications and monitor utilization on contingent hardware will reduce costs and help justify the investment in continuity services. However, the consolidation provided through virtualization must be measured against the capacity required to support a significant processing disruption. The planned switch of an application to support business needs (e.g., maintenance tasks) is not the same as supporting a significant processing disruption. The migration to a virtual cloud computing environment must be able to support disaster recovery requirements, including testing, while at the same time providing operational resiliency.
Action item: Technology controls, processes, and tools must be developed and implemented to ensure the proper governance of virtualization and cloud computing technology. This includes the integration of operational resiliency features with existing DR recovery plans and strategies.
As cloud computing develops, the IT infrastructure environments that need disaster recovery and business continuance services become more complex. Increasing sections of the IT infrastructure will be virtualized, and over time virtual machines will be migrated to other systems and other locations. Parts of the computing load will be using SaaS services, and part will be migrated to external clouds to create hybrid clouds.
This set of interconnected and integrated services will offer new ways of mitigating risk. Replication services that continuously copy the data from one site to another and then require reconciliation and restart can be replaced by dynamic migration of workloads. Data slicing and dispersal technologies can allow applications to continue to access data even if large chunks of the data are unavailable.
Equally, there will be additional risks. Virtualized systems have fewer resources available to process recovery, because the resources are running at much higher utilizations. The traditional DR plans did not take into account that the systems were virtualized and may be running in other locations. External network services may appear to have capacity, but simply not have the resilience to cope with a local disaster affecting many organizations. Compliance and security of the hybrid cloud is extremely difficult to assess and monitor.
Action item: If you are selling DR and BC services you need to augment these capabilities (either directly or through partnerships) to provide assessment and governance planning services that dovetail to your offerings. Be aware there are many new risk factors that could kill your credibility; these will vary by industry. In an outsourced environment, your customers should demand and will expect transparent monitoring and communication of key metrics and performance indicators of infrastructure health.
The implication of new technology for the requirements for assuring the continuity of business in the event of failures, successful cyber attacks, and other disruption of normal operations is just one of the major challenges to business continuity. Coupled with budget pressure, gaining funding and commitments from the LoBs, driving toward consistent BC/DR standards compliance across vendors and services, dealing with pandemic and ever growing security threats, and the need for increased public and private cooperation, the emergence of virtualization technologies and the cloud has created a set of implications for the enterprise that cannot be ignored.
However, the principals of business continuity planning and disaster recovery remain applicable to the cloud computing model. Cloud computing requires that business continuity and disaster recovery professionals be continuously engaged in vetting and monitoring of resource virtualization and cloud computing plans and operations. The key to challenge is to collaborate on risk identification, recognize interdependencies, integrate, and leverage resources in an effective way.
The Name of the Game - Enterprise Risk Management
The effectiveness of the Enterprise risk management program, in which business and service continuity is a primary discipline, is of course one of those implications. Organizations are faced with trading traditional technology risks and controls for new business and enterprise risks when sourcing strategies move technology infrastructure and responsibilities beyond the legacy data center environment and toward virtualized resources, virtualized business processes, private and public cloud services. This in essence causes a shift away from traditional enterprise technology control thinking (e.g., access management, change control, data security, infosec, backup, disaster recovery, and business continuity) and more toward service provider relationships, capabilities, and accountabilities, or in a nutshell, to business risk management - less technology to manage, less technology risk to some degree, and more complexity and business risk.
Cloud as a Business Continuity Enabler
In addition, the technologies that are virtualizing the data center and all business and organizational processes surrounding it create new opportunities to transfer and distribute technology and business risk management to third parties in ways that, if well conceived and implemented, will reduce the overall risk to the enterprise. For example, cloud services can serve as an important component to quick recovery, low data-loss business-continuity solution of the SMB market. Regularly synchronizing business operations with a Unisys, EMC, Google, Rackspace, Microsoft, Amazon or other cloud service can make information and applications immediately accessible for a number of business failure scenarios. For larger organizations, cloud storage provides the ability to disburse data across multiple geographies and reduce single points of failure. Assuming they have created machine images that mirror their production environments, businesses can rapidly recover into the cloud without paying to run an entirely redundant data center 24x7.
Role of Information Governance
Information governance is one starting point for cloud business continuity initiatives. Information governance programs for the cloud are designed to help organizations protect, manage, and develop data, applications, and infrastructure as a valued asset, while maintaining a reasonable cost-to-risk ratio. Working across risk disciplines, including business continuity/DR, information governance programs ensure that technology and business virtualization initiatives, functions, and principles result in an infrastructure and service arrangements for managing high-quality operations that are available, reliable, consistent, auditable and secure.
Action item: COMING SOON - ALL YOUR DATA TO AND FROM THE CLOUD. Organizations should embrace virtualization and cloud computing as a business continuity enabler. Build business continuity and resiliency requirements into the journey toward data center resource virtualization, business process virtualization and application encapsulation, and internal and external cloud services.
On the March 2, 2010 Wikibon Peer Incite public call, we discussed how virtualization and cloud computing are influencing business continuity (BC) and disaster recovery (DR). One of the themes that emerged during the call was the notion that virtualization and cloud computing, either together or separately, might allow us to streamline some or all of our DR processes and procedures.
Virtualization and cloud computing both encourage the use of automation to spin up new servers easily and move operating systems and data from one place to another, both tasks that, one way or another, are part of recovering from a disaster. It seems that the everyday techniques used for virtualization and cloud computing processes are ideal candidates to incorporate into DR plans.
Taken a step further, these processes are nearly the same whether the reason is to perform normal operations or recover, so it seems that a separate DR plan can be highly streamlined to refer to standard daily procedures and include only those extra processes required in an emergency. As these processes become more standardized, they become easier to test on a regular basis. This may be optimistic and possibly simplistic, but it may be that a decision to test the DR plan may be as simple as migrating a set of servers and their data to another location. It may even come to the point, using virtualization and cloud computing services, that the “DR manual” could focus more on what should be done for the people involved in the situation and less on the computers and data.
Action item: Compare the processes used for deploying and moving virtualized computing resources, whether they be local or “in the cloud”, and look for overlap with your DR plans.