Note: This is an excerpt from a paper presented at CMG 2008 by Amy Spellmann (Optimal Innovations), Richard Gimarc (Hyperformix) and Charles Gimarc (LSI Corporation). For more detail, please see the full paper: Green Capacity Planning
Managing today’s datacenters demands consideration of energy use in addition to the traditional goals of achieving service level agreements (SLAs) for availability and performance. IT costs are rising due to increasing costs for electricity. Since datacenters have limited electrical, cooling, and space capacity, it is essential to determine when these limits will be reached. At the same time, there is growing social concern about the impact on our limited natural resources.
Green Capacity Planning (GCP) is a new approach for capacity management that includes optimization of costs for energy and computing equipment as well as server capacity, performance, and response times. GCP extends the best practices for mainframe, server, and network capacity planning with consideration of a new resource - energy, and energy footprint and cost projections. GCP does not change the capacity planner’s goal of determining a cost-effective way to deal with change while meeting the business SLA, it adds consideration of energy as a new ingredient. By applying GCP, the need for datacenter expansion or construction can often be delayed.
Capacity Planning Process
GCP is the process of predicting when future business demand will exceed the availability of IT equipment, energy and space in the datacenter and then determining the most cost-effective way to meet SLAs and delay saturation. This process is illustrated in the figure below. The green boxes indicate tasks that are new to the GCP process.
Traditional capacity planning develops a plan for increasing the number of servers and storage as load increases over time. GCP adds another dimension to consider power utilization, thus optimizing and planning system expansion to meet customer SLAs with additional constraints for power and datacenter space.
Capacity planning starts and ends with business requirements. What is the workload? How does the load vary over time (inter-day view)? How is the workload and data capacity expected to change over time (intra-day view)? What SLA is the datacenter expected to achieve throughout the period of workload & capacity growth? What is the time period over which the plan extends?
Once the business requirements are known, the iterative GCP process begins, first by analyzing the existing datacenter. What is the computational requirement? This has no real upper limit since technology changes enable greater capability and allow more compute power in smaller spaces, at less electricity power and heat, and with less system management. What is the storage requirement? Technology changes allow more capacity with less resource. What points in time must the servers, storage, and network be upgraded to support changing capacity and SLA requirements? These three components - servers, storage and network – are the key systems within the datacenter.
Next, the energy footprint is computed based on the upgrade plan. The energy requirements may vary over the course of a day, depending upon changes in the daily load, and will certainly vary over months as business requirements change.
Once we have the energy requirement of the IT equipment, the energy requirement of the entire datacenter — the energy footprint - is computed. The Green Grid defines the Power Usage Effectiveness metric for datacenter infrastructure power as PUE = ( Total Facility Power ) / ( IT Equipment Power ).
For example, a PUE of 2.0 implies that for every watt of power used by servers, storage, and networks, an additional watt is used by the site infrastructure to support the IT equipment. A PUE of 1.0 implies that all power consumed by the datacenter is used exclusively for IT equipment, no additional site infrastructure (e.g., cooling) is required. A typical PUE is around 2.0.
Using PUE, the energy footprint is: efp = PUE X IT Energy. Once IT equipment and energy requirements are known, the total datacenter cost is computed. This includes the cost of the IT equipment and energy. Eventually datacenter energy and cooling may exceed the capacity of the facility. The GCP process then loops back to the choice of IT equipment, making different selections until the total datacenter cost is optimized. The cost of energy for a system is based upon energy footprint (in terms of kWh) multiplied by the cost of energy (dollars per kWh).
When the optimization loop completes, a plan for upgrading the datacenter IT equipment and infrastructure is developed. This plan indicates points in time at which servers and storage are added and estimates of how workload, storage and compute capacity, electrical and cooling load, and costs change throughout the planning period.
Power usage within the server and storage can be estimated as between two limits: Idle power consumption (active idle) and maximum power consumption (processor intensive). By assuming that server and storage power usage scales linearly with CPU utilization, power use over time can be estimated, as the datacenter load varies.
Case Study
In a published TPC-W benchmark, a datacenter of 24 servers supports a Web-based retail business with a database of over 100,000 items, 76,000 users connected via Web browsers, and a throughput of 9,700 transactions per second. The example system is then grown at a rate of 5% per month for 24 months. Growth occurs in all aspects of the business: size of the database, number of users, and throughput, all while maintaining the initial SLA. We also assume that throughout the 24 months, CPU utilization will never exceed 70%, response times will never exceed those at the start of the period, and energy costs are minimized. This example does not consider network equipment, however the GCP process applies to all IT equipment.
The choice of this particular system and benchmark as an illustration has the benefits of a publicly available detailed description of a non-trivial application and its implementation, sufficient information to create a model, and use of commercially available IT equipment for developing an upgrade plan. A server capacity modeling tool may be used to determine the infrastructure required to successfully grow the workload by 5% per month over the two-year planning horizon.
Two different growth scenarios are considered:
- Server Upgrade: Add a new server whenever any server reaches the 70% utilization threshold;
- Virtualization: Instead of upgrading individual servers, virtualize the saturated server(s).
In the Server Upgrade scenario, over the 24-month period, the system changed:
- From 24 to 27 servers. New servers are larger than the ones replaced.
- 15.6 to 20.6 kWh per month.
- 6 upgrade points.
- $150,000 cumulative costs, for equipment and energy.
In the Virtualization scenario, over the 24-month period, the system changed:
- From 24 to 6 servers. New servers are much larger than the ones replaced.
- 15.6 to 7.7 kWh per month.
- 6 upgrade points.
- $120,000 cumulative costs, for equipment and energy.
Virtualization is used to reduce the server count and energy footprint. At the end of two years, the expanded virtualized system has 3.2 times the capacity of the baseline, at approximately half the energy demand. Storage is upgraded the same way for each scenario. Over the 24-month period, storage increased from 41 to 133 disks, 1.3 TB to 4.3 TB of capacity, and from 15U to 14U of rack space. The upgrade includes an initial change from parallel SCSI to SAS storage, greatly increasing density and reducing energy consumption.
Action Item: The two scenarios show that the virtualization approach has a significantly smaller energy footprint, and a reduced energy cost. Also the energy footprint of the virtualized configuration at the end of two years is approximately half of our original baseline configuration. At the end of the two-year plan, the cost of the server upgrade scenario exceeded the virtualization scenario by $30,000. Both of these systems achieved the same SLA at the end of the study period, but with vastly different IT equipment requirements.
In the illustration, virtualization is used as a mechanism for reducing the energy footprint of the system. While virtualization is a good choice from a cost and energy use standpoint, the key as always is to model the virtualized environment, determine SLA achievability, scalability, and project energy footprints prior to acquisition, implementation and deployment.
IT managers are concerned with adding computational capability into their datacenters to satisfy the requirements of a growing business. Since space is a limited resource in a datacenter; to grow we must process more transactions and store more data per cubic foot of space. Electricity is also a finite resource since datacenters have an upper limit on the amount of power they can use based on number of circuits and circuit capacity. To effectively scale a datacenter of fixed size, the IT manager must find a way to make the system more efficient. Green Capacity Planning helps IT managers plan, specify, and deploy systems of increasing computational density, power, and storage capacity that will meet the business’ SLA and fit within the finite physical resource constraints.
Footnotes: