Ryan Perkowski has 450 Terabytes installed. Half of it (225TB) is performance and availability critical. At today’s storage prices for Tier 1 storage, that is a current storage value of less than $2 million on the floor. So why did Ryan pay $450K for Virtual Instrument tools to monitor the SAN?
The method that Ryan uses to convince his customers is to properly cost out the all inclusive cost per TB that the end-user pays. The cost to his business customers for mission-critical tier-1 storage is $60,000/TB, ten times the purchase price. It includes the costs of backup and recovery, performance and availability assurance, additional copies, the storage network, compliance, storage staff and the monitoring tools. Sure, if the project will work with tier-2 storage without the frills, go for it. But the cost of supplementing the services if they are actually required will be much higher for the project team than taking standard storage services. If a performance SAN is required, the cost of the monitoring software as a proportion of the total cost is small.
What are the benefits of performance knowledge? There are four levels of justification:
- Cost Avoidance – This means that storage that was going to be bought to solve a perceived I/O problem that was not actually a storage problem is not bought. In Ryan’s case, he avoided having to upgrade a EMC DMX3 to a DMX4, because the Virtual Instrument probe provided detailed and correlated historical information that showed reducing the block size at the database level would solve the performance problem. This more than paid for the whole installation. By knowing the end-to-end performance characteristics across the SAN he avoided the cost of over-provisioned storage “just in case”.
- Time to Solution: By knowing that the problem was not in the SAN, and by providing a wealth of information to help identify the server-side problems, projects can be rolled out quicker. The tool is being used by the database groups to help them implement better solutions and solve problems faster. The price of additional probes is included as part of the development cost.
- Rationalization of Storage Software on the SAN Components, particularly at the switch and array level: Each component should have the tools to manage operations, but end-to-end management should be left to best-of-breed heterogeneous tools that take the all the data from all SAN components and correlate it historically. Ryan is in the process of removing some of the storage management software and saving a bundle.
- A Deeper Understanding of Storage Performance Trends: In Ryan’s case it shows that data growth has gone from exponential to linear, but that access density is going through the roof. The data is being exploited much more heavily. Ryan is in a position to know that he has to think about performance storage architectures that are capable of delivering much higher levels of IOPS/TB than the current architecture can provide. He has the data and charts to show it, and the confidence of senior IT management that there is value in exploiting the data and that there is a justified price to pay for improved storage IOPS performance.
So are SAN end-to-end performance tools the solution to all performance problems? They do have profound limitations. They provide a snapshot of the SCSI conversations from HBA to-and-from disk. If a component of the SAN does not provide detailed data, those data correlations will be missing. The SAN tools do not provide an application-view of the performance and whether the storage system is meeting the SLAs for that application. In a virtual server environment, they do not show the relation between the I/Os and the virtual machines from whence they came.
This data is required in the open systems arena. But developing it requires a new management model and new standards. Companies such as EMC are attempting to introduce these models and tools in VMware, and the very recent partnership between HP and Microsoft claims to be aiming to solve the same problem.
Currently it requires an army of experts to solve a deep performance problem in a virtual machine environment. The creation of more proprietary stacks should eventually provide better end-to-end tools and reduce the size of the army. Eventually the tools may be automated and eliminate the army altogether.
Action Item: While we are waiting for nirvana, IT storage managers and senior IT management with high-performance SANs would do well to follow Ryan’s philosophy. Focus on a very few best of breed third party tools for end-to-end SAN management, and use vendor tools for within component management. And get rid of everything else.
Footnotes: