Storage Peer Incite: Notes from Wikibon’s April 15, 2008 Research Meeting
Moderator: Dave Vellante & Guest Analyst: Michael Crader of BT
Over the last two years, BT has conducted the largest, most intense Windows server consolidation and virtualization project in history, across the United Kingdom. Using VMware, it actually eliminated three large data centers entirely while cutting its power and cooling use to a degree that actually has positively impacted the nation's electrical grid and generation capacity reserve. As a result it won three prestigious European Union awards for decreasing its carbon footprint. On a financial level, this nationwide effort, which it is now expanding worldwide, achieved its full payback in its first eight months, reduced the time needed for setting up a new server from weeks to one business day, and made it possible for BT to do a full daily backup of all its WinTel servers in a half hour. And this has given it huge credibility for entering a new business, consulting with other large enterprises on reducing their carbon footprints while increasing their operating efficiencies and saving significant amounts of their operating budgets.
This experience puts the lie to the propaganda that says that constantly expanding carbon footprints are the cost of an expanding economy, that conservation requires huge investments that can wreck budgets and destroy corporations, and that therefore ecologists are by definition anti-business, anti-prosperity, and anti-jobs. This issue of the Wikibon newsletter describes what BT accomplished and discusses the implications for other organizations interested in undertaking similar projects. G. Berton Latamore
Dave Vellante with BT's Michael Crader, Head of Windows Consolidation
Today the Wikibon community had Michael Crader, the head of Windows Consolidation on BT, present on the two-year project he headed to consolidate thousands of Windows servers in eight major facilities across the United Kingdom, using VMware and SAN storage.
In 2006 BT faced a crisis. Its UK data centers were out of capacity, physical space, and cooling capabilities. It was consuming 0.7% of all electricity produced by the nation, and energy costs were through the roof. It was failing to meet SLAs, and backup took days, leaving customer data at risk. To say the least, clients were not happy. And the high cost of real estate in the UK precluded building more data centers, which would cost it $120M US.
To remedy this situation, BT embarked on an ambitious project to consolidate more than 3,000 Wintel servers across the UK. The four primary business objectives of the project were:
- Improve information access;
- Reduce operational costs by lowering data center footprint and power and cooling expenses;
- Improve operational efficiency and increase systems utilization by virtualizing servers and storage;
- Respond quickly to business needs through a capacity-on-demand model.
The specific project objectives involved:
- Consolidating more than 3,000 existing Wintel servers;
- Achieving a consolidation ratio for servers of 15:1;
- Migrating to an ‘allocate-on-demand’ infrastructure;
- Implementing a best practice standard for all future Windows deployments.
In the words of Michael Crader, “It was vital that BT not institute tomorrow's legacy systems today.”
The Storage Imperative It achieved a server consolidation ratio of 15:1. To fully implement VMware, it synchronized server and storage virtualization and moved storage from a fragmented, DAS model with a little bit of everything to a centralized SAN standardized on NetApp that increased utilization drastically, cut backup time for the full system to a half hour, and allowed a small staff in a single centralized facility to manage and control the entire system.
It chose NetApp because it made thin provisioning easy and allowed for unified NFS, iSCSI and fibre channel. BT used NetApp Snap technology to eliminate tape and replicate across a WAN. As well, NetApp provided a centralized managed service for BT.
The Benefits The benefits were astounding. On average, every time BT shut off a server it saved 700 watts of power. In total, BT saved approximately two megawatts of power or $2.4M in annual energy costs. It cut hardware required by 50%, increased storage utilization to 70%, reduced server maintenance costs by 90%, and a project that started in April of 2006 paid back by Christmas.
- 3,100 physical servers to 134;
- 700 racks at eight sites down to 40 at five sites;
- 2.1 megawatts of power consumed down to 0.24 megawatts;
- More than 9,000 network ports down to 840;
- Backup from 96 hours to a full daily in 30 minutes; and
- Server deployment from six weeks to one working day.
BT got rid of maintenance contracts on 3,000 servers, eliminated antiquated backup processes, reduced maintenance expense, re-deployed technical staff, eliminated tape operators (tape loaders) and reduced restoration time from hours or days to seconds. More importantly, the firm disposed of (in an eco-friendly manner) 225 tons of equipment across two sites.
For its excellence in green IT, BT was recently recognized at the European-wide green awards summit and took home three top awards: European Green IT Summit Awards
Advice to Others
BT’s experiences with this project suggest the following advice to other users:
First, obtain buy-in from the business. As always, management support is critical, especially with virtualization, as many business owners will not want to virtualize ‘their boxes.’ Be prepared to sell this capability to the business based on better responsiveness, lower costs to the firm and eco-friendliness. As well, sprinkle in some virtual capacity (memory and disk) to sweeten the pot.
Virtualize test and development applications first to build credibility as a pilot. Use that ‘street credibility’ to build a strong story and sell management on the concept of saving many millions of dollars. But be prepared to develop in-house skills and ‘own’ the architecture.
Remember that not all applications can be virtualized, and some will require specialized storage for performance and other business imperatives (e.g. fax servers and heavy-hitting Citrix boxes).
Finally, be prepared to make investments up front. You must build a new infrastructure to prepare to exploit the efficiencies that synchronizing server and storage virtualization can bring.
Action item: Customers considering consolidation with virtualization should synchronize server and storage virtualization to gain maximum efficiencies such as those demonstrated by BT. Stories such as BT’s are compelling. However, users should be vigilant to address the root causes of data growth and be prepared to classify, migrate, archive, and where possible shred inactive data. Don’t just buy the hype; implement disciplined approaches to information management that complement technology investments.
An effective virtualization project will need significant investment if, as Michael Crader of BT says, you are going to avoid building “tomorrow's legacy systems today”. That means building a business case and implementation plan that will work for the specific workloads, current infrastructure, and organization. A good virtualization project entails a complete overhaul of the physical architecture, including servers, storage, and storage network. Even more important is a logical overhaul of the allocation, backup, recovery, compliance, archiving, and charge-back software and processes.
BT initiated a pilot for a specific workload (test and development). The team tested and revised the physical and logical architectures specifically for the type of workload in question (i.e., infrastructure computing). This allowed them to demonstrate to the CFO that significant investment was justified. The team extended that architecture to specific infrastructure applications based on Intel X86 architecture. Key BT storage decisions were using FC to avoid I/O bottlenecks, using storage controllers that allowed intermixing of all storage types, using very large numbers of virtual snaps for backups, using disk-to-disk backup, and using thin provisioning to over-allocate storage(almost 2:1).
The type of architecture developed and applicability of virtualization will vary by workload type. Michael Crader’s team identified the characteristics of X86-based applications that were not suitable and did not virtualize them. The architecture and products used for BT’s larger UNIX systems is very different.
Action item: IT executives should ensure that virtualization projects include a complete re-architecture of the physical and logical infrastructure. Virtualization should include storage and servers. To be effective, significant new investment is required, and IT should demonstrate that the new architecture works for real in a pilot. Most important of all, the workload type, not products or fashion, should determine the architecture.
One of the interesting sidelines of BT's Windows consolidation story was the way that Michael Crader of BT described the carrot and stick approach to gaining business support to virtualize servers.
The carrot is a backdrop of energy efficiency and corporate responsibility blended with advertised lower support costs for virtualized environments. For the footdraggers, managers can even throw in some extra memory and disk space, all virtual of course, and double dip from the pool of resources.
The stick is a mirror image, using the corporate green initiative as a lever (you must do this to be green) and the threat of higher support costs for non-virtualized environments (kind of the same way large systems vendors jack maintenance to sell more mips).
The real kicker was taking action with notification that on "XX date these physical servers will be virtualized. Please notify us if this date is not convenient and we'll choose another. Please be aware that this event will occur unless you notify us of an alternate date."
As the saying goes, no deal is worth doing if you're not willing to walk. When it comes to virtualization, gaining management support to pull these types of stunts should be compulsory, or don't sign up for the project.
Action item: Organizations must be aware of and plan for the friction that will come about from initiating virtualization projects. 'You're not taking away MY server resources' will be the cry from the business. Use corporate green initiatives as a lever, turn support cost knobs and take a service supplier's mentality when negotiating with the business, meaning leverage critical mass to sweeten the pot where necessary.
If one Googles “VMware storage issues”, the hit list is long and the links fairly recent. As users have rushed to embrace server virtualization, they have ended up placing demands on storage that put many storage vendors on the defensive. Indeed, at a recent VMWORLD conference there were several sessions on storage issues. And at another recent conference users complained that VMware is not friendly to N_Port ID virtualization (NPIV), a new approach that ties virtual HBAs directly to individual guest hosts.
But, the first consideration must be certification. Many disk array products and/or configurations have not yet been certified by the virtualization software vendor.
Second, there is performance. For virtual server environments, storage performance is truly king. Indeed, many users claim VMWare performance is fundamentally based on storage. Done right, the ability to scale both server and storage resources as needed is a tremendous benefit
Next is storage virtualization. As layers of virtualization are added, measurement tools have a more difficult time providing accurate performance and capacity measurements, making capacity planning and tuning difficult. Also, remember that not all applications can be virtualized, and some will require specialized storage for performance and other business imperatives (e.g. fax servers and heavy-hitting Citrix boxes).
Equally important is disk storage availability. A failure in a RAID group could end up impacting hundreds of servers. Thus vendors need RAID-6 or better and clever ways to manage the impact of rebuilds.
Traditional backups don’t work either, so disk-to-disk is mandatory and the snapshot copy capabilities must be robust and highly scalable and manageable. Even those using VMware’s own virtual machine file system (VMFS) say they enjoy less-than-ideal results, getting advanced features like file-level restore at the expense of placing a storage agent on each guest host.
Another requirement is virtual clones that allow a user to keep just one copy of a system image on disk and serve it up on multiple LUNs. Of course, booting over the storage network has to be bullet proof. And the storage has to understand that virtual servers can move dynamically and be able to shift things around in synch.
Then thin provisioning's many benefits create temptations. However, in this highly dynamic environment it is also important that the storage be provisioned and reallocated after use with the same simplicity as the virtual servers. A good commentary on this can be found at Thin provisioning: Look before you leap.
For convenience and cost, products offering multiple host interfaces are ideal. The list includes Fibre Channel, iSCSI, SAS, NFS and CIFS.
Finally, the biggest dangers generally are the result of configuration issues. When one starts to virtualize and abstract resources, it is very easy to end up with resources put together that don’t belong together. Thus users need to consider cross-domain reporting and monitoring tools, such as Akorri’s BalancePoint and Onaro’s SANScreen (now owned by NetApp) to improve troubleshooting, tuning and change management.
Not only does the storage industry need to meet all these requirements. It must also develop and market specialized software for environments like Oracle and SQL – e.g. snap manager for Oracle – and make this a margin opportunity. Better techniques that provide consistency across volumes are also badly needed.
Action item: The storage industry must recognize the impact of server virtualization on storage requirements and embrace a new way of doing things that reflects the needs of virtualized environments --- and do it quickly!
In Wikibon parlance, GRS stands for 'getting rid of stuff.' There is no doubt that from a GRS perspective, BT's Windows consolidation project succeeded. The most amazing part was the elimination and disposal (ecologically friendly of course) of more than 200 tons of old equipment. The rest of the GRS story is well documented:
- 3000+ servers down to less than 150
- 700 racks at 8 sites down to 40 at 5 sites
- Reduction in energy costs of more than $2.4M annually
- 9000+ network ports down to less than 850
- Backup from four days to a full daily in 30 minutes
- Unused capacity declined precipitously as storage utilization shot from the low 20's to 70%
- 6 weeks to deploy new servers down to 1 working day
The danger in stories like this is that companies get lulled into a sense that technology can be applied to solve efficiency problems for a much wider set of applications than what BT has wisely chosen for virtualization. BT's VMware applications are infrastructure related such as Web, firewalls and smaller databases that can be considered point systems. Over time, application creep into virtualized environments could expose the fundamental lack of disciplined approaches to classification and automation of policies for data migration, archiving and shredding. This is the root cause of waste and data growth in many applications, and users should address this problem head on.
Action item: Choosing virtualization applications wisely (e.g. infrastructure apps) will help get rid of tons of stuff, literally. Customers should beware of falling prey to promises that virtualization will achieve similar results more broadly. In these less virtualization-friendly environments, there is no getting around the need for better information management, starting with classification and the automation of policies to migrate, archive and ultimately get rid of unused data.
Virtualization is gaining popularity every day as a tool to reduce complexity, decrease costs and make a data center "greener." Virtual machines (VMs) are actually software files. During large virtualization and consolidation projects, it is easy to create VMs, but they can quickly proliferate, and keeping track of their usefulness can add complexity back into the management equation as it becomes hard to to know when to retire a VM. VMware for example offers a tool called VMware Lifecycle Manager to take VMware generated (proprietary software) virtual machines out of service when they are no longer useful. VMs created by other virtualization software will require their own software to de-activate their VMs.
Storage virtualization can reduce complexity by providing a single point for measurement and management. However, as layers of virtualization are added, measurement tools have a more difficult time providing accurate performance and capacity measurements, making capacity planning and tuning difficult. In any virtualization effort, normally scarce staff resources have to be made available for a period of time committing to completing what can be a sizeable process.
Action item: Don’t just fall prey to all the hype. Virtualization is like a drug -it really helps at first but costly side-effects often appear. Users still need an overall information management discipline in the wake of virtualization efforts to keep the IT infrastructure functioning without a hitch.