Introduction
As part of a recent storage-related study of vSphere 5 adoption, Wikibon conducted seven in-depth user interviews provided by three of the vendors reviewed in our study. This respondent is responsible for operations for a European business social network site. Ninety-nine percent of its systems are [www.debian.org Debian Linux] based. It runs two co-located data centers. The discussion below was excerpted from the highlights of our one-hour interview with the respondent.
Interview
Wikibon - What led you to VMware vSphere 5?
Case Study Respondent - We did not have vSphere 4. Last year we started with some ESXi servers to find out if virtualization was something we should use. Those worked out for us, so our next step was to evaluate alternatives to assess the best approach for a production environment. VMware stood out as the most stable environment, so we decided to go further down that path. Because we are using Debian Linux, we ended up being a special beta site for vSphere 5.
VMware sent us a consultant who was a fan of Linux, so from the very beginning we had a co-worker relationship – not like a consultant working in a customer’s company. And I think that was one of the key reasons for the success of the project. We didn’t have Windows systems before we decided to virtualize. We have to use this now because the vSphere environment requires we have some servers running Windows. But using vSphere 5 we have had a lot more opportunities to run systems on Linux instead of Windows. That was one of the points that drove us to start with vSphere 5.
Wikibon - Why did you move to virtualization in the first place?
Case Study Respondent - One of the main drivers for our choosing to virtualize was our development environment. Our old environment was unsatisfactory because someone coming in with a high-performance test could affect the work of every other developer working on that same system. So we decided to virtualize to make it possible for developers to do their own tests and to work in parallel without affecting their coworkers’ work if they made mistakes or needed to do performance testing.
Wikibon - What’s next for you in virtualization?
Case Study Respondent - Our CTO and member of the executive board recently asked me what I thought about our being cloud-ready within three years. He would like every system we are running right now be able to be virtualized by 2014. I’m not sure we can get to 100%, but maybe 95% would work.
The thinking is that we should be state-of-the-art by getting away from physical servers and better utilizing hardware. Secondly, we need to ensure high availability and high resilience. If you are committing to virtualization, you have to change your whole infrastructure - you need systems that are able to run on their own and not rely on a fixed infrastructure. So the systems themselves get more stable and robust if you make them ready for virtualization, especially if you start to move virtual machines through data center borders.
Wikibon - What databases do you run, how are you managing the virtualization of those, and how are you thinking about that in the future?
Case Study Respondent - The databases for our production environment are MySQL, none of which are virtualized. These databases talk to our customers, so they are usually I/O bound. They need to be fast. If a database master breaks, the system is going down. In the worst case, I might lose user data - and that must not happen. So we have been extremely careful in [our] approach to virtualizing databases.
Recently I asked a large outsourcing player in our field, “How do you virtualize your Oracle systems?” He answered, “We do, but only to a certain extent, and very carefully.” I think it’ll be very interesting, because it would be great to be able to move databases around to have very high availability. But I also need to be sure that the system is performing - at a minimum – at the same performance level as physical hardware. For us right now, we have systems consuming up to 96 GB of random-access memory for database. And that’s exactly the amount of memory that I get when I buy a vSphere 5 license. So from a cost perspective, it doesn’t really make sense for me. It’s more the fault tolerance that I’m interested in.
We have also tested VMware’s GemFire database1. It’s extremely interesting, especially because it has no master [node]. But it doesn’t fit into our development environment right now.
As we head down the total virtualization path, we will have to consider whether MySQL is still the right database for everything we do. I think we will end up having different databases for different purposes. For me, it’s easier if I make a very careful decision about which vendor to go with and then stay with them for the next three or four years.
However, we are still looking at these things to make sure that we don’t overlook something. We have set up test servers with MySQL databases in a virtualized environment to see how it works.
′′Wikibon′′ - Can you talk about the management tools, particularly for storage? Do you use vCenter or the storage array management tool or third-party tools?
′′Case Study Respondent′′ - We are using vCenter if possible and sometimes switching to the user interface from our array vendor to manage some things. We’re trying to make as much use of the VMware tools as possible to manage everything. We sometimes have special requirements which need to be fulfilled using the array vendor’s tools. We prefer to not use both tools but use API calls and make our own interface do the work that should be done, especially to make it re-doable for the future. It’s always easier to have a script that you can rerun instead of trying to have the same click path through the graphical interfaces.
Wikibon - What tips would you give somebody starting to get the best out of this environment, and any things to avoid?
Case Study Respondent - I would recommend that they really make sure that they’re using the right products. Ask for some kind of proof of performance claims. Don’t believe it when you hear, “Everything’s great. Everything works.” Buy some trial licenses and try to find out if it really fits your needs. And if you’re new to virtualization, you will most likely have to rethink your decisions within the first year. This isn't a bad thing. Keep some of your budget in your back pocket, because you will need it to correct some decisions you made your first time through. You might have gone down a wrong path or a less efficient way of doing things.
Wikibon - Is there anything you would like to see VMware do in the storage area?
Case Study Respondent - I would like to have VMware be available on Linux only with a PostgreSQL database, or even MySQL, and not have to run Windows or Oracle to get the value out of the system.
Wikibon Observations
Databases remain the greatest challenge to achieving high levels of virtualization. Often the few large databases that cannot be virtualized represent a small percentage of machines but a very high percentage of cost and value to the organization. The core problem is the IO tax. At a minimum, there is a latency and overhead of IO being handled by both the operating system and the hypervisor where there is only one virtual machine in a physical machine. The worst case is multiple virtual systems sending IO to the hypervisor, which acts as a blending machine, turning sequential and random IOs into purely random IO. This significantly increases latency, variance, and overhead in traditional disk-based systems. Databases that operate near or at any performance ceiling (or may at certain times of the week, month, or year) are likely to be carefully protected by application and database administrators, who will need to run extensive tests and retests before they risk moving to a new environment.
There are potential cost and availability advantages of having a database as the only virtual machine in a physical machine; however these approaches are waiting for better flash-support within VMware for databases, and better IO architectures.
An issue that concerns service providers is the management of multiple system components. Having multiple storage management systems requires manual coordination and integration, with high costs. Service providers are after significantly improved automation and economies of scale, and a key part of this is the use of RESTful APIs to both gather information and automate processes. Storage (and other) vendors wishing to address the service provider marketplace will need to provide a full suite of management APIs that will allow full automation. The same approach will also be required in large-scale enterprise data centers.
Notes: 1 Gemfire is a memory-based system that allows much higher levels of real-time processing, but also brings the challenges of managing large number of memory nodes in a production environment.