Storage Peer Incite: Notes from Wikibon’s August 21, 2012 Research Meeting
Recorded audio from the Peer Incite:
As IT administrators virtualize their production environments, they are discovering that they are turning what once was essentially a set of discrete mini-environments into a single huge, constantly changing, and increasingly complex unified architecture.
Problems, changes, and additions that once were isolated now can cause major ripple-effect impacts across the data center, in some cases impacting the QoS of multiple unrelated business services. Figuring out what to do about problems as simple as replacing a server with a newer model or introducing flash storage suddenly have implications for every application in the environment, including those most critical to the business.
Increasing QoS can become a major problem. Moving to flash storage for Tier 1 data, for instance, looks like an effective way to improve response times for users. But in fact often it just moves the choke point down the line, leaving you to guess what should be upgraded next -- A server? A network switch? How can you even be sure that the application in question uses that particular hardware, when the hypervisor is constantly shifting compute loads in response to demand changes?
And proof-of-concept testing, frustrating enough in the pre-virtualization environment, becomes vastly more complex when you cannot be sure what data or applications are going to be using that hardware and under what conditions. And given the speed at which virtualized environments evolve, you may not have the time to complete the test.
In these situations what you really want to do is talk to someone who already has solved the problem in a similar environment. And now you can do the next best thing. A cloud service startup, CloudPhysics, is applying big data analysis techniques to the problem of running large virtualized environments. It has built a large and growing database of information from actual users documenting their experiences with hardware upgrades, problem solving, and other relevant issues. Subscribers can do their own searches, either using canned search "cards" developed to meet the needs of other users, or commission the CloudPhysics staff to create another card defining a new search parameter.
On August 20 CloudPhysics users, staff, and third-party experts discussed their experiences and company philosophy and direction with Wikibon members in a Peer Incite meeting. The links to the audio and video recordings of the hour-long meeting are below, along with articles analyzing the major take-aways from the meeting. This is a particularly valuable Peer Incite that should interest all IT staff involved in virtualization projects, regardless of where they are in the process.
Bert Latamore, Editor
On August 20, 2012, Nathan Smith, Senior Citrix Engineer at Centered Networks, and Joachim Heppner, a Senior Manager at Sanofi, joined the Wikibon community to discuss their early experiences working with a cloud-based, big-data repository of performance and configuration data from VMware virtualized environments. Also on the call were John Blumenthal, CEO of CloudPhysics, as well as Noemi Greyzdorf, VP of strategy and Alliances at Cambridge Computer Services and well-known VMware employees, Duncan Epping and Frank Denneman, who are advisors to CloudPhysics.
CloudPhysics is a startup, which emerged from stealth mode on August 19th. The company developed and maintains the repository that Nathan and Joachim are evaluating. By the August 19 launch, the repository already included 10s of terabytes of configuration settings, performance statistics, and inventory data, and is growing with continuous data streams from hundreds of environments. CloudPhysics plans to dramatically increase the size of the repository and the number of contributing organizations over the coming weeks.
CloudPhysics uses the data in the repository to provide insight into the performance and availability of virtualized environments under a wide variety of deployment scenarios. Out of this, the company hopes to deliver best practices, which can be used by IT professionals, consultants, and suppliers to improve the efficiency of operations and improve the quality of management and decision making in complex, dynamic environments.
To understand the enormity of the management challenge in virtualized environments, one needs only to look at the set of possible combinations of server, network, and storage technologies, together with the full complement of options for configuration settings, workloads, and applications. No IT professional, no supplier, and no consultant can possibly analyze all of the available options to maximize efficiency, performance, and availability. And yet, every organization is responsible for meeting service levels, and every vendor is at risk of being blamed, when service levels aren’t met.
If IT professionals had unlimited budgets and time, they could replicate production environments and evaluate new technologies or experiment with configuration settings in the current environment to determine the impact on performance and availability. That, however, is not reasonable, given the substantial limitations under which organizations operate. Instead, they are better served by learning from the experience of their peers in environments that approximate their own. While the insight and recommendations may not be perfect, it will certainly improve the quality of decision making.
The applicability of the insight and the quality of best-practice recommendations will be directly correlated with the amount and variety of data in the repository. Therefore, it is important for CloudPhysics to garner the support of the supplier community in embracing this approach. Emerging suppliers or established suppliers entering new market segments should benefit a great deal by having data regarding their installations in the repository. According to CloudPhysics, Fusion-io is already working with the company to validate the benefits of Fusion-io in virtualized environments.
Participation by IT professionals ultimately will determine the success of the CloudPhysics approach, since they are the ones who will supply the bulk of the data. CloudPhysics has no current plans to compensate users for contributing their data, relying instead on a belief that users will see sufficient benefit from gaining access to the learnings from other contributors’ data. Of note, CloudPhysics does provide safeguards against the release of proprietary information regarding any specific organization’s environment, but Chief Information Security Officers and Compliance Officers will certainly want to take a close look at policies and procedures before allowing their IT department to contribute data to a shared cluster.
CloudPhysics is currently in the process of building out a set of applications, called Cards. these are focused tools to solve specific problems, such as ensuring the health of an HA cluster, evaluating data store contention, or determining data store utilization. As Nathan and Joachim agreed, CloudPhysics’ Cards give a quick, high-level assessment of their environment and actionable recommendations to improve performance and availability.
Action item: Given the range of options and the dynamic nature of virtualized environments, it is enormously challenging for suppliers, consultants, or IT practitioners to reliably predict the impact and benefit of new technology or to optimally tune existing environments. There is great promise in a massive repository of configuration and performance data from a wide range of environments that can be analyzed to determine best practices, evaluate scenarios, and assess risk. IT professionals, consultants, and the supplier community can all benefit from such an approach, but in order to do so, they will need to be willing to share.
Footnotes: In an August 25 press release, CloudPhysics announced the company had received a $2.5 million investment from the Mayfield Fund and well-known angel investors, including VMware co-founders Diane Greene and Mendel Rosenblum and former Veritas CEO, Mark Leslie.
Big Data is all the rage these days. Organizations are harnessing the power of data to gain insights and improve decision-making. It’s hard to read an article that doesn’t contain the phrase and it’s often difficult for CIOs to determine just how “Big Data” can help them in their day-to-day work.
In what might seem like a game of buzzword bingo, a new cloud-based service actually leverages aggregated data to help CIOs better manage their VMware environments. On August 20, 2012, the Wikibon community discussed how this new player—CloudPhysics—is working to bring big data to bear to make improved virtual operations a reality.
Workload planning and proofs of concept
For CIOs, running proofs-of-concept to determine workload needs has been a constant task. Under the old paradigm of single server to single app, such sizing calculations were relatively simple to perform, although they still took staff time, and an error in the calculations would result in poor implementation. But the simplicity was there and pretty scalable. As needs increased, CIOs just bought more hardware to throw at the problem. It was an efficient calculation made possible by the one-to-one nature of server to application.
Today’s IT environment barely resembles the environment of just ten years ago. Whereas the older environment simply added server after server after server to meet new needs, CIOs today are leveraging extremely complex virtual environments with shared servers, shared networks, and shared storage. These are much more complex, and change is constant, with applications being dynamically migrated among hosts as administrators intervene or as automated rules governing workload management indicate that workloads are better suited elsewhere.
Further, as IT organizations begin embracing cloud providers as extensions of the primary data center infrastructure, these environments will continue to grow in complexity.
Due to the intertwined web that is today’s data center environment, creating proofs-of-concept environments intended to gauge how new applications may interact and integrate with the production environment is very difficult and an increasingly time-consuming task. Worse, it’s next to impossible to determine just how a new application will really work under a truly dynamic environment that changes every minute.
For every problem, there is a solution. Someone, somewhere has probably devised a solution for most of the application integration problems. That’s where CloudPhysics comes in. This startup leverages crowdsourcing to create data sets that will allow clients to determine the kind of information they need when they deploy a new application. In its model, people will share their application performance information with CloudPhysics anonymously, and this data will be aggregated for consumption by others.
Suddenly, a CIO shifts from “best guess” decisions around sizing for application needs to being able to make data-driven decisions based on real-world information by leveraging a community.
A significant impact
The potential impact is significant. CIOs get a better result at the end, and they no longer need to task staff with months-long costly proofs of concept that divert staff from business-facing work. New solutions can be deployed much faster without the need to perform proof of concept performance testing meaning that IT can get solutions to market much more quickly than otherwise possible.
Action item: Although words like "cloud", "Big Data" and "crowdsourcing" are often vague and overused, CloudPhysics provides a prime example for when these forces come together in a tangible benefit for CIOs. Those who have significant application performance testing needs should look at CloudPhysics and consider participating in the company's ongoing application performance gathering and assessment ventures. By doing so, they may be able to leverage this cloud-based service and eliminate what is a significant internal task, resulting in a better overall outcome and more IT staff time to be devoted to solutions development.
When single servers ran single applications attached to dedicated resources, managing the performance and availability of the environment was relatively simple, although inefficient from an infrastructure-utilization standpoint. Server, network and storage consolidation improved efficiency factors. This led to the need for domain-specific tools for application, server, network, and storage administration. The dynamic nature of virtualized environments demands a more-consolidated view of the impact of changes to the environment across the entire infrastructure and application stack. Without such a view, it is extremely difficult to determine the effect on the overall environment.
Even now that we have many monitoring solutions integrated with virtualization platforms, critical data around, for instance, storage performance is still only captured on a per server level. This makes it impossible to get a holistic view. CloudPhysics has integrated information across the entire stack using well-established APIs as well as custom integration with VMware to give deep insight into performance and availability metrics.
By collecting configuration and performance data from a large number of environments, CloudPhysics will be able to analyze a wide variety of environments and determine best practices for end-users, consultants, and suppliers. This big data approach will enable the creation of simulation engines and advanced models. Simulations and advanced modeling enable the end-user, consultant, or architect to test hardware and/or software changes without any risks.
Correlation of settings and configurations and understanding the impact each setting has on a complex environment, such as a virtual infrastructure, is time consuming and, above all, very difficult. This correlation of metrics allows you to save time, but it also helps you understand the behavior of your environment and pro-actively improve it where required.
Action item: Big data is helping every industry to make data-driven decisions using advanced simulations and modeling methods. Integrated solution stacks require big data analytics to decrease the operational expenditure associated with these environments. CloudPhysics brings this to the world of IT, and enables the users to contribute to the platform enabling an organic growth of the platform, but, more importantly, to gain unlimited insights into your datacenter.
With the exception of the latest generation of converged-infrastructure appliances, today’s virtualized environments are typically a collection of components from a large number of suppliers. A recent Wikibon study on VMware customers indicates that 56% are using one or more extra hypervisors in addition to VMware.
The virtualized world is posing major challenges for IT administrators as troubleshooting up and down the stack, from application to compute to network and storage, is often guess work – certainly not a science. IT needs an end-to-end view of the environment to manage performance and availability and to continually improve on user experience. It is unlikely a viable solution will come from virtualization vendors themselves.
Meanwhile, hardware infrastructure suppliers have been largely unsuccessful developing tools for heterogeneous environments and have also had a tenuous relationship with third-party performance management and monitoring applications. The strategic question for vendors is whether they want to develop their own best-in-class tools for managing their infrastructure components, collaborate and integrate with third-party management application suppliers, or pursue a hybrid approach.
According to founder John Blumenthal, “The CloudPhysics platform exists to help IT make the best decisions in the face of continual change. We make the invisible visible, revealing structure and relationships in the datacenter that are hard to surface. Applying big data science and resource management domain expertise, CloudPhysics delivers focused and unique analytics to IT’s decision cycle. The datacenter becomes smarter, faster, and more effective.”
CloudPhysics’ intelligence is driven by anonymized data provided by users who recognize the value of benchmarking and comparing their virtualized environments with other, similar IT environments. Benefits to community members include reduction in time tuning and testing systems as well as availability of ongoing insights and best practices for managing virtualized resources. Described as Resource-management-as-a-Service (RaaS) or virtual expertise, CloudPhysics intends to dramatically improve on the promise of running the most optimal, cost-efficient virtualized environment possible.
IT professionals are loath to read 50-page best-practices documents in order to deploy a new system. Moreover, any single vendor lacks visibility across today’s increasingly heterogeneous and virtualized datacenter, which utilizes products from a variety of suppliers. A community not connected to any single vendor that offers benchmarking services and recognizes that users want to share anonymized data in order to improve their ability to quickly resolve hypervisor performance issues should be a welcome addition to any IT administrator’s tool kit.
Action item: IT professionals have diminishing time to learn supplier-specific tools and so will likely look favorably upon offerings such as CloudPhysics, which provides best-practice guidance based upon a large number of users and environments. Suppliers will need to adjust their business and investment models to adapt to this new reality.
On the August 21, 2012, Peer Incite, early evaluators of CloudPhysics discussed the benefits of comparing and modeling their virtualized environments against the best-practice experiences of their peers. CloudPhysics takes a big-data approach to analyzing configuration, setting, and performance data to determine best practices. For the CloudPhysics approach to be successful, two things have to happen:
- CloudPhysics has to develop a meaningful and substantial baseline body of data against which organizations could compare,
- Organizations have to be willing to share information about their own configurations, their settings, and their environments’ performance.
There are two hurdles in sending data off-site. The first is that doing so must be simple and non-disruptive for the operations professionals. CloudPhysics has addressed this by building all the data-collection capabilities into an application that runs within the virtualized environment. The second is that sending any proprietary information off-site typically involves gaining the approval of the Chief Information Security Officer (CISO) and/or the Chief Compliance Officer (CCO). In order to speed the process of gaining CISO and CCO approval and address their concerns, CloudPhysics encrypts and anonymizes user data, provides access control to the user, and provides comprehensive documentation.
Action item: Leveraging a cloud-based service where an organization’s data is delivered to a big-data repository of configuration and performance data is new territory. The chief information security officer, chief privacy officer, and/or chief compliance officer need to be engaged early in the process so that their concerns around security, privacy, and anonymization of data are addressed. Otherwise they will become an impediment to moving forward. They will need to know who has access to the data, how controls are maintained, and what controls remain with the organization. Some organizations, including most government agencies, will want a private cluster of their own data, which accesses public data but to which the organization does not contribute its own data.
Server virtualization is transforming data centers, increasing utilization of physical resources and introducing new architectures across all layers of infrastructure. IT staffs are able to manage larger environments and respond with more agility than with legacy physical environments. That being said, IT budgets are tight, and staffs are strained to keep up with growth. On Wikibon’s August 20, 2012 Peer Incite, the community discussed how CloudPhysics is looking to bring A Big Data Approach to Managing VMware Environments.
Since IT staffs are already stressed to the max, Wikibon always looks for things that IT can get rid of when deploying a new technology or service. CloudPhysics holds the potential of being able to reduce or eliminate proof-of-concepts and what if scenarios in physical environments. Users will be able to simulate changes before putting them into environments and carry out proof of concepts through modeling. The PoC modeling could even be better than existing processes; the aggregated community data examined can give a more accurate picture of what will happen when a solution is deployed in production than a test environment would. Nathan Smith, Senior Citrix Engineer with Centered Networks, says that CloudPhysics’ “cards” will display simple red/green indicators for VMware configurations, such as is HA setup properly, that even a non-technical person can understand.
By accessing a global pool of data and data scientists who are interpreting the data, CloudPhysics provides tools that help customers make better decisions. This will improve all steps in the operation of the overall virtualization environment from discussions with vendors through rapid deployment and performance optimization. While IT staffs will still need competent administrators, leveraging the crowd will give them access to an additional group of experts without having to grow the talent or experience in-house.
CloudPhysics is in the process of gathering anonymized data (see CloudPhysics.com for details on how to participate), and data sharing is not yet available. In addition to gathering configuration data, CloudPhysics is looking for the community to challenge its team with the biggest problems and puzzles faced in virtualization environments. Shared wisdom and a new generation of analytics creates the opportunity to simplify and streamline the operations of virtualization environments.
Action item: With the availability of a giant repository of industry best practices, IT professionals can streamline and perhaps get rid of the proof of concept approach, particularly around performance, interoperability and HA. At a minimum, they will be able to shorten the time to and improve the success rate of future changes to the environment.