Wikibon Energy Lab Green Validation Report:
3PAR InServ Virtualized Storage Arrays
(crtl ++ to increase size of text)
Introduction to this report
Wikibon Energy Lab Validation Reports are designed to assist customers in understanding the degree to which a product contributes to energy efficiency. The four main goals of these studies are to:
- Validate the hardware energy efficiency of a particular technology as compared to an established baseline.
- Asses the potential contribution of software technologies to power savings, and validate the actual contribution in real world installations.
- Quantify the contribution of the hardware and software technologies to a green data center.
- Educate business, technology and utility industry professionals on the impact of technologies on reducing energy consumption.
Our objective is to identify not only the hardware energy consumption but importantly the often overlooked and hard to quantify green software aspects of technologies. Wikibon Energy Lab Validation Reports are submitted to utilities such as Pacific Gas & Electric Company as part of an energy incentive qualification process.
Wikibon Energy Lab defines and validates the hardware testing procedures to determine the energy consumed by specific products in various configurations. As well, Wikibon reviews actual customer results achieved in the field to validate the effectiveness of these technologies based on real world field data analysis. These proof points are mandatory for the utility company to qualify a specific vendor's technology for energy incentives.
Wikibon Energy Lab Reports are not sponsored. Rather they are deliverables required by PG&E and other utilities as part of an incentive qualification process. As part of its Conserve IT Program, Wikibon is paid by the vendor to perform services associated with securing incentive rebates from utilities for end customers that acquire the vendor's technologies. To ensure this process is completely independent, Wikibon lab and field results are sometimes vetted by a third party engineering firm hired by PG&E or other utilities.
Wikibon only produces Lab Validation Reports for technologies that have been qualified for rebate incentives by PG&E or other utilities and have passed strict utility company guidelines. By adhering to this criterion, Wikibon assures its community of the independence of these results.
Executive Summary
Wikibon reviewed the power measurements made on 3PAR’s InServ E200, S400 and S800 storage arrays. These arrays are built from standard components, and enable the power of different configurations to be calculated. The measurements were made at the component level. Table 1 gives the results of the measurements and is repeated in the Drive Measurements section of this report.
The measurements were made across three different workloads:
- Sequential read, with a 512K block size
- Random read, with a 16K block size
- Idle (no data being transferred)
The results of the three workloads were averaged to produce an overall figure of power consumption for each of the components of the 3PAR arrays. Wikibon believes that the benchmarks used and the power measurements made were done professionally and in good faith. In the opinion of Wikibon, this measurement represents a good and reasonable estimate of the power consumed in real world applications across a number of drives for applications typcally found in a data center.
In addition to these hardware measurements, Wikibon conducted a statistical analysis of the benefits that 3PAR customers had achieved with the use of virtualization and thin provisioning software in the field. The analysis showed that the likely direct saving in disk storage capacity would be a reduction of at least 35% and often more. This savings is the direct result of higher higher storage utilization which consequently lowered the number of spinning disks required to meet application needs, thereby reducing energy consumption. Details of this analysis are provided in this report and here.
Used in combination, these results represent a complete and accurate reflection of the actual power that would be consumed by 3PAR’s storage array products in a data center environment, and the likely savings that virtualization and thin provisioning will achieve.
David Floyer, CTO
Wikibon Energy Labs
Mountain View, CA
Technology Background: Virtualization and Thin Provisioning
The fundamental building block of storage systems is a storage volume. A storage volume can (and is usually) smaller that a disk drive, but can also be bigger and span multiple storage devices. The operating system “mounts” a storage volume to be able to read only or read/write to that volume. For example, a PC has one or more disk drives. A number of logical drives (equivalent to a storage volume) can be made from these disks drives (C: drive, D: drive, etc) and space allocated to them when they are created. The space on the disk drives is formatted and allocated to these logical drives when they are created. External volumes can be accessed from a PC over a LAN or WAN.
There are number of problems with this approach, including:
- The Space is allocated when the storage volume is created; a large amount of space is initially allocated but not used.
- Over time, the data becomes fragmented as (for example) when data is deleted, but the space is not released. Think of trying to fit foreign pieces to a puzzle that are either too large or too small for the available space.
- It is difficult to move volumes around to optimize storage, as the application has to be stopped while the volume is moved. This limits the ability to reorganize the data and take advantage of spare capacity on the disk drives.
- When a copy of a volume is required, the whole storage volume has to be copied.
Virtualization breaks the connection between what is written, what is allocated, and what is unused. This concept was applied first by IBM in the 1960’s to the storage on the CP/67 server operating system, which later became VM with virtual disks.
3PAR (and other vendors) have integrated virtualization into the storage operating system. Virtualization allows a number of techniques to be used to reduce the amount of data that is actually stored:
Thin Copying
The traditional way of copying an active volume (one that is being read and written to) is stop the application, and copy the volume. This is required because there may be updates to the volume that would be written while the volume is being copied, and the two copies would be out of synch. When the copy is complete (this can take a significant time for large volumes), the application can be restarted.
A more modern way of copying an active volume is to take a “Snap copy.” The storage controller starts to copy the data from one volume to another, and caches all the writes. The controller then ensures that the data is updated appropriately in both volumes until the volume copy is complete, when the target (new) volume is automatically disconnected. This approach allows the application to continue, but still requires multiple copies of the volume to be stored on disk, and still requires a large number of disk actuator movements to complete the copy.
Thin copying is a different technique which allows a virtual copy to be made. The virtual copy is only updated with the original data if the source (original) volume changes. The average amount of data that is updated on a volume during a twenty-four hour period is usually small (<5%), which means that multiple copies can be made over time with minimal overhead.
Thin Provisioning
Figure 1 provides a diagram of how thin provisioning works. As discussed above, the server believes that it has the storage allocated to it in a storage volume, just as it would in the traditional (fat) provisioning environment. The allocation is a virtual allocation, and real physical space is dedicated only when some data is actually written to that volume. The unit of data written in the 3PAR system is a “Chunklet,” which is only 256 bytes long. These chunklets are held in a common storage pool, and dedicated to a volume when it is written. As the pool serves a large number of volumes, the probability that all the storage will be requested together is vanishingly small.
Thin provisioning does not work with all operating systems and file systems; care has to be taken in setting the correct parameters to avoid allocating a large amount of space to a server, and finding that the file system or database has written on all the data available. As storage virtualization and thin provisioning have matured, software developers have also ensured that their software can take advantage of these features.
3PAR's InServ virtualized storage arrays
Equipment Measured
The 3PAR storage arrays measured were the 3PAR InServ S800, S400, and E200 arrays.
Components of 3PAR Storage Equipment
The following diagrams describe the components of 3PAR S400/800 array.
Power Measurement Approach
Measuring Equipment Used
A PowerSight true RMS analyzer PS-3000 was borrowed from PG&E between 3/17/2008 and 3/31/2008. Measurements were repeated between sixteen and forty times.
Location of Testing
The testing was done in 3PAR’s development facility at 4209 Technology Drive, Fremont, CA 94538 (tel: 510-413-5999). The testing was overseen by Sean Etaati, Engineering Manager, at the above address and phone number.
Measurement Methodology
The key component of a storage array that varies power consumption with different workloads is the drive chassis, because changes in I/O rate changes the actuator movement on the drive. The 3PAR S400 & S800 contain up to 40 drives per chassis, and the E200 contains up to 16 drives per chassis. Most of the measurement effort was dedicated to the drive chassis to measure power consumption with different drive types and different workloads. The relevant properties of the disk drives are:
- Capacity - The amount of data held, measured in gigabytes (GB), where one GB is equivalent to 1,000 million bytes
- Spin Speed - The rotational speed of the disk, measured in rpm (e.g., 15K = 15,000 revolutions per minute)
- Transfer Rate - The maximum transfer rate from the disk, measure in gigabits (Gb)/sec, where one gigabit = 1,000 million bits
- Interface – The means by which the drive connects and exchanges data with host computer. 3PAR arrays support either FC (Fibre Channel) or SATA (Serial Advanced Technology Attachment) interfaces.
The drives types used by 3PAR are:
- Seagate FC 147GB 15k @ 4Gb
- Seagate FC 400GB 10k @ 4Gb
- Seagate Enterprise SATA 750GB 7.2k @ 4Gb
The measurements were made with three different workloads:
- Sequential Read, with a 512K block size
- Random read, with a 16K block size
- Idle (no data being transferred)
Read benchmarks were used for practical reasons, specifically it is much quicker to reset after a benchmark is run which accelerates the preparation for the next run. The main power consumed by disk devices is actuator movement and media rotation, which are similar for reads and writes.
The three benchmarks were averaged to produce an overall figure of power consumption for the 3PAR arrays. In Wikibon’s opinion, this measurement represents a good and reasonable estimate of the power consumed in the real world applications across a number of drives for “typical” combinations of applications found in a data center.
Setting the Baseline
The baseline for this study was set by determining the power requirements of the 3PAR arrays assuming no virtualization and thin provisioning. This was accomplished by taking a starting point of the configuration required assuming a modular array without virtualization and thin provisioning. We measured the 3PAR array in this configuration to create the baseline of comparison.
This approach essentially 'turned off' the effects of thin provisioning and virtualization on the 3PAR equipment.
InServ Configuration with Virtualization and Thin Provisioning
We then 'turned on' the configuration required to provide the same number of I/O's, ports, and other resources but with fewer spinning disk drives required due to the impacts of virtualization and thin provisioning. These configurations were based on actual field data from 3PAR customers reporting allocated versus written capacity as described in the following section of this report.
Virtualization & Thin Provisioning Measurements
Virtualization and thin provisioning are relatively new technologies that significantly reduce the amount of storage that is required, and improve the utilization of that storage. A detailed description of the technology is given above. The purpose of this section of the Energy Lab report is to estimate the level of saving that can be achieved with virtualization and thin provisioning software.
3PAR offers a “Call-Home” program that is used to help predict component failures. It is subscribed to by a good percentage of its customer base. As part of the prevention analysis, the system captures metadata about disk usage and function usage from the service processor in each array connected to the system, including metadata about each volume that has been created on the arrays. No customer data is transmitted. The database of this metadata allows this data to be analyzed at on a customer-by-customer basis, taking an average for all the volumes at each customer. This is a good way to observe the data as an individual customer typically organizes volumes consistently.
The key metric we wanted to study is the improvement in utilization that virtualization and thin provisioning can achieve. For each volume there is a logical size (what the server sees) and a physical size (what has actually been written). In traditional storage arrays, the physical and logical size are similar. With virtualization and thin provisioning, the physical size is significantly smaller than the logical size-- i.e. the application thinks it has more storage than is physically installed, allowing less storage capacity to be provisioned.
As shown in Figure 8, Wikibon performed a statistical analysis of this data base by installation and found that the actual physical disk capacity required to store data was only 33% of the logical capacity (# customers = 115). From an energy point of view, 66% of the disks were not required. However, a closer analysis of the data showed that a few of the customers had very large and inordinate gains. Upon investigation, Wikibon discovered what is happening is that customers are finding new ways of exploiting the virtualization and thin provisioning technology. For example, customers were making multiple thin copies of data many times per hour, so that they could recover back quickly to an earlier copy of data if data was corrupted (using a continuous data protection (CDP)-like approach). This is very efficient with thin copying, because only changed data has to be written. The amount of data saved can be very high indeed.
However, from an energy rebate point of view, this is not a savings in energy, because the customer was not actually making physical copies before. To make an apples-to-apples comparison and to estimate the true potential power savings, Wikibon assumed that customers with savings in disk space of 50% or more had changed the procedures on how they were storing data. Wikibon isolated the subset of customers (n=42) that had less than 50% savings. The weighted average saving was 35%. Wikibon believes that this figure is a good predictor of the direct reduction in disk capacity that will be realized by customers. This is the default figure that has been agreed to by PG&E as the basis for incentive rebate calculations.
Component Measurements
Having an understanding of the utilization impacts of virtualization and thin provisioning, we needed to then measure at the component level both drives and controllers in a variety of configurations.
Drive Measurement Results
The tables and charts below summarizes the power results from the drive measurements taken by 3PAR and validated by Wikibon. The detailed measurements are shown in Appendix I. The results of the analysis allow the power consumption of each of the drive components (drive chassis and each disk drive type) to be calculated. The average figures are used in the final results shown in Table 1.
The characteristics of the benchmarks run are given in the table below.
Controller Measurement Results
The power characteristics of the S400/S800 controller nodes are given in the table below. Battery backup and power domains are included in the controller measurements. The measurements were designed so that a power rating for an HBA and S400/800 Controller could be derived. The final averages are used in Table 1.
The power characteristics of the E200 controller node are given in the table below. Battery backup and power domains are included in the controller measurements. The measurements were designed so that a power rating for an HBA and E200 Controller could be derived. The final averages are used in Table 1.
The benchmark characteristics for the S400/S800 and E200 controller nodes are given in Table 5 below.
Field Validation
Once PG&E approved the Wikibon analysis on a preliminary basis we were required to demonstrate results in the field. The S400 Storage Array was delivered and installed by 3PAR to California State University, East Bay (CSUEB), who is a PG&E customer. Full details of the CSUEB green project can be found in this Wikibon case study. A configuration report was run against the machine by 3PAR. Wikibon, PG&E and its outside engineering firm used this data to confirm:
- The hardware installed, in particular the number and type of disks installed.
- The software installed, in particular the virtualization and thin provisioning software that reduces the number of disks required and enable the storage array to be more power efficient.
- How many disks were saved by the software, and how much power was saved as a result.
This report confirmed the following:
- 40 terabytes of disk were installed on the 3PAR array on 115 disk drives.
- 99 x 300 gigabyte and 16 x 750 gigabyte drives were installed in the array (total 115 drives).
- Software to enable virtualization and thin provisioning were installed on the array.
- A detailed configuration report was run 148 days (5 months) after installation, which showed:
- The amount of data that is actually written to disk is seven (7) terabytes.
- The amount of data that the application is utilizing is sixteen (16) terabytes.
- The ratio between the data the application is consuming to the amount of data actually written to disk is 2.2 (this compares to an average of 2.5 found in the study of over one hundred (100) 3PAR virtualization and thin provisioning customers detailed in this report.
- The saving in energy after 148 days comes from eliminating the need to install 24 disk drives.
- After one year, the total saving in disk drives is projected to be 59 disk drives.
- In addition (but not included in the calculation of the PG&E energy rebate), CSUEB removed two storage arrays that use 2.8kW per hour.
- The energy saving after one year including UPS and cooling overheads is 52,815 kWh in the first year.
- The overall energy saving over 5 years that CSUEB will make from the installation of virtualization and thin provisioning is $35,907.
Customer Considerations and Caveats
The virtualization and thin provisioning technology embedded in 3PAR's Inserv arrays are industry leading and clearly demonstrate improved utilization in the field. The technology directly results in fewer spinning disks required and improved energy efficiency as disk drives typically represent more approximately 60-80% of power consumed in a mid-range to high end arrays.
The power supplies used in 3PAR's Inserv arrays hover close to 80% efficiency but cannot be demonstrated to sustain 80%+ efficiency. While this is a relatively small consideration in the overall power equation, every little bit helps and 3PAR is expected to include an 80+ efficiency rating in future designs.
Wikibon found the reports generated from the 3PAR array while reasonable, needed some work to interpret. Customers would benefit from simpler reporting to be able to demonstrate to management the energy saved by making an investment in 3PAR arrays.
In Wikibon's view, 3PAR arrays demonstrate industry leading storage efficiency and can lead to substantial energy savings relative to conventional arrays without advanced virtualization and/or thin provisioning technologies.
Appendix I: Drive Measurements
Appendix II: Calculation of Customer Disk Savings
The figures above were used as inputs into the following Table 3, which was used to calculate the percentage of additional disks that would have been required to be installed if Virtualization and thin provisioning were not available on the 3PARS400 storage array installed.
The key conclusions are:
- The additional disks that would have been required if virtualization and thin provisioning were not installed would have been 24 disks
- After one year, the total saving in disk drives is projected to be 59 disk drives
- The percentage of additional disks that would have been required if the virtualization and thin provisioning software was not installed was 46%
Acknowledgments
Wikibon is grateful to Pacific Gas & Electric Company and the California Public Utilities Commission for funding the incentives that motivate this study. We would also like to thank 3PAR technical & marketing managers and the Cal State University East Bay data center managers for their support and cooperation.
Disclaimer
This report was prepared by Wikibon. Neither Wikibon, PG&E nor any of its employees make any warranty or representations, expressed or implied, or assume any legal liability or responsibility for the accuracy, completeness, or usefulness of any data, information, method product or process disclosed in this document, or represents that its use will not infringe any privately-owned rights, including, but not limited to, patents, trademarks, or copyrights.
This report uses preliminary information from vendor data and technical references. The report, by itself, is not intended as a basis for the engineering required to adopt any of the recommendations. Its intent is to inform the customer of the potential cost savings. The purpose of the recommendations and calculations is to determine whether measures warrant further investment of time and/or resources.