Most organizations today tend to be mostly or completely virtualized. As their existing 3-tier infrastructure ages, they face a dilemma of whether to refresh their existing servers and SANs or migrate to a web-scale or hyper-converged platform.
The IT leadership often wants to evaluate a TCO (Total Cost of Ownership) analysis comparing the costs of both platforms over the lifecycle of the equipment. In these cases, it is important to factor in variables such as administrative costs, rack space requirements, scalability expectations, etc.
In order to assist IT staffs with their analyses, I have developed a formula that looks at the monthly TCO over N number of years per Virtual Machine:
N = The length of the analysis period in number of years. I typically like to evaluate the cost of ownership over a five year period.
E = Equipment Cost. Initial cost on the equipment including installation plus additional anticipated equipment and installation cost to handle anticipated future demand plus refreshed cost (if appropriate) of the initial equipment.
S = Support Contract Cost After Expiration of Warranty Period. This applies both to the initial equipment purchased in year 1 as well as any applicable equipment purchased in later years.
A = Administrative Costs. There are a few ways we can approach this.
- We can estimate the percentage of FT server/storage/network administrators required for each solution multiplied by the average fully burdened applicable administrator salary.
- We can estimate hourly time requirements for the various administrative tasks and multiply those by average fully burdened applicable administrator hourly wages.
- We can actually monitor administrative hourly time required for repetitive tasks for each solution and multiply that by the average fully burdened applicable administrator hourly wages.
P = Power cost per month. If hosted internally, the average watts/hr. required for the various server and storage components can be calculated in various ways. For example:
- Power usage can be obtained from manufacturer specification sheets. While this usage can vary depending upon load, it serves as the basis for a good estimate.
- Various tools can be used to measure consumption as well as, potentially, UPS monitoring capabilities.
If utilizing a co-location facility, the facility will often have a monthly cost of power cost either for circuits or for racks that can be used to calculate the cost of power for the equipment.
R = Rack cost per month. If hosting at a co-location facility, the facility generally charges a monthly cost per rack which can then be divided by the number of U the rack contains and multiplied by the number of U the equipment consumes. When empty rack space is required for air flow purposes, this space should be added to the equipment space requirements.
If hosting internally, an estimated cost per rack will be required based either upon the total square footage of the datacenter or based upon other variables.
r = Organization's cost of capital.
V = VMs supported.
An Example
Table 1 shows an example of utilizing this approach. Projected CapEx and OpEx costs over a five year period are calculated for an organization considering migrating to web-scale infrastructure.
Estimated Number of VMs:
The organization has 700 server VMs, and expects to increase this number by around 10% per year.
Initial Investment: We're looking just at the infrastructure cost. In evaluating the web-scale scenario, POC tests indicate support of an average of 50 VMs per node. In order to support the initial 700 VMs, the organization starts with 14 nodes (4 blocks). We assume the nodes cost $54,834 each including 3 years support.
Extra Nodes to Support Additional VMs:
The VMs/node density is expected to increase by at least 20% per year due to both Moore’s Law and to manufacturer enhancements. This means that 1 node is added per year. The 5 year TCO period ends with 18 nodes (5 blocks).
Administrative Cost: Today, Nutanix web-scale enables one administrator manage around 350 nodes. In order to take a very conservative approach for the TCO, we’ll use one administrator for every 200 nodes, and multiply the applicable percentage of an administrator by a fully burdened salary of $150,000 per year.
Power Costs: Manufacturer specifications show that the maximum power consumption for one block (four nodes) is 1,350 Watts. Multiplying this by 4 equates to 6,750 watts. Running the units 730 hours per month at a Texas kWH cost of .097, and assuming a 60% cooling factor, results in a total initial monthly power/cooling cost of:
1,350÷1,000 ×4 ×730 ×1.6 ×$.097=$503 per month or $6,036 per year.
Adding a 5th block in year 5 boosts the total power cost to $628 per month or $7,536 per year.
Rack Space Costs: For rack cost, we’re taking a widely-used industry short-cut and adding a 50% uplift to the power cost.
Monthly Average TCO per VM
The total 5-year cost of ownership for the web-scale solution comes to $1,133,253. The VM TCO formula with a 5% cost of capital results in a discounted monthly average TCO per VM of $83.
Using the formula without a cost of capital (i.e. without discounting the cash flows) results in a monthly average TCO per VM of $92.
Other Variables to Consider
Running this same exercise with conventional servers and storage, or with integrated solutions such as Vblock, will result in a TCO per VM that can be compared directly with the result of utilizing web-scale as in the example above.
Even though the TCO doesn’t quantify many of the variables that would be highlighted in an ROI analysis, it is still worth at least identifying them for both platforms in order to assist with the decision-making process. The most important of these variables include:
Time for deployment of additional infrastructure: This capability can be extraordinarily important in certain situations such as a retail firm needing to quickly ramp up computing capabilities in time for the holiday season.
Resiliency: Because the data in a web-scale infrastructure is maintained on 2 or more nodes, risk of down-time is much lower than in traditional server + storage environments.
Performance: The elimination of network latency as a factor along with auto-tiering of data between disk and flash can result in significant performance enhancement. The elimination of running all storage traffic through two physical controllers also can significantly reduce performance, particularly in a VDI environment, from elimination of read and write storms.
Disaster recovery: The ability to natively replicate at the virtual machine level can significantly reduce both the cost and complexity of effective disaster recovery while improving both replication and recovery performance.
Scalability: The ability of web-scale to scale without limitation one node at a time eliminates the potential requirement for expensive and complex upgrades or even purchase of entire new arrays should the number of VMs or resource requirements exceed projections.
Reducing Hypervisor Cost: The ability for the platform to support multiple hypervisors, particularly simultaneously, has the potential to significantly reduce an organization’s cost of software.
Private Cloud: The fractional consumption capabilities of web-scale – the ability to scale one node at a time – eliminates the budgetary challenges of affording increased traditional server + storage + network infrastructure for business units requiring only modest resources for additional VMs. Fractional consumption also makes it easier to develop a meaningful chargeback system.
Hybrid Cloud: The ability for the platform to replicate workloads to public cloud services can significantly enhance an organization’s cloud strategy.
Author Disclaimer
I work at Nutanix.
Action Item: Choosing whether to deploy legacy 3-tier infrastructure or web-scale/hyper-converged (Server SAN) infrastructure as the platform for a virtualized data is a huge decision. A detailed analysis should be undertaken to compare the total cost of ownership between the two platforms over the lifecycle of the products. The TCO should include variables such as administrative costs, rack space, power and cooling requirements, etc. IT staff can utilize the TCO formula and example presented in this article to calculate the TCO per VM for the various options they wish to evaluate.
Footnotes: