Storage Peer Incite: Notes from Wikibon’s April 10, 2007 Research Meeting
David Floyer presents To Blade or Not to Blade? Blade computing can offer substantial cost reductions by stripping out and sharing key components such as power and cooling across more servers. But blades can't serve all workloads and organizations must have a critical mass of blade-friendly applications to benefit.
Over the past few years, the remaining very large hardware vendors (e.g. Dell, HP, IBM, Sun) have focused much of their invention on blade computing; the packaging of CPU, memory, network and I/O capabilities utilizing a common set of technologies that can be easily added to or removed from a shared frame with common support and environmental resources, all under the control of a single set of management and console resources. While all these vendors are utilizing the same nomenclature and similar concepts, the reality is that blade products are still distinct enough that mixing and matching is not advisable. This places some users in a quandary regarding their use of blade technology.
Specifically, as each vendor attempts to position its blade products not only as superior to other vendor products but also positioning blade computing as superior to other types of computing, enormous promises are being made regarding the degree to which complex hardware configuration and operations work will go away as a consequence of using blade computing.
The reality is that for applications at the margin such as very large transaction processing applications or other applications with very high write to read ratios, etc, configuring applications to work on specially configured hardware will remain a challenging and complex undertaking. However for the broad array of applications such as email, Web serving and even many analytics applications, blade computing can offer the benefits of simpler configuration, change and operations from a hardware perspective.
Users must recognize though that they will pay a premium for lower volume blade complexes to cover the true costs of large blade racks and packaging technologies as well as vendor markups for their specific solutions. Therefore, small and medium-sized businesses should be very careful to assess the marginal benefits versus the marginal costs of using blade technology. Typically, we found that application loads that require fewer than 5-10 blades (15-20 CPUs) are often best devoted to traditional standalone server technologies under a single console. Over time, we expect to see a convergence of blade standards as competition in that marketplace continues to foster from smaller targeted yet still very viable players. IBM, HP, Sun and Dell will have to respond to this competition or find themselves increasingly on the outside looking in on innovation in the blade marketplace.
Action Item: Users in smaller shops must not rush to blade computing because they will pay a premium for much of the technology benefits that could double or triple the cost of a server complex. In larger shops, the realities of complex hardware configuration to run specialized workloads, especially those featuring high write to read ratios, will not go away soon despite what blade or other server technology vendors promise.
The idea behind blade computing is a good one. Strip out, share and dramatically reduce the number of components like power, cooling, consoles and the like and squeeze more costs out of distributed computing. If every application workload fit well into blade environments, there would be no other computing approach.
Unfortunately that's not the case. Blade servers are perfect for applications that are parallelizable but more complex workloads with higher transaction and update activity are often not well-suited for blade architectures. The point is blade computing works best when organizations apply a 'one-size-fits-all' strategy, meaning all the blades in the chassis are as similar as possible and ideally identical (and of course from the same vendor). This makes blades more swappable, easier to manage, simpler to back up and cheaper to acquire and inventory spares. Greater diversity within a chassis defeats many of the benefits of blade computing.
Buyers must become more aware of the drawbacks of succumbing to the allure of blade computing without fully understanding its marginal costs and marginal benefits. Specifically, despite strong marketing pushes by blade vendors into small and medium sized businesses, these organizations often don't have the scale to exploit the economics of blade servers. Often, the marginal costs of chassis and the drawbacks of sole-sourcing outweigh the incremental benefits. Frequently, smaller customers find that the lack of critical mass in blade-friendly applications means they'd be better off buying traditional collocated servers and preserving their freedom to shop for the best server deal.
Action Item: IT must cut through the hype of blade computing and intelligently set expectations based on assessing the degree to which the critical mass of blade-friendly workloads can drive commonality and reduce costs. SMB customers will often not have this luxury and while larger customers can more easily scale, they must design commonality into blade computing infrastructures creating pods of simlar if not identical blade groups to support applications.
Commodity hardware is cheap, but inherently unreliable; blades are no exception. The biggest causes of failure are complex operating procedures and commodity operating systems and disks. Virtualization of blade environments can mitigate these problems.
The key is complete separation of storage from processors and ensuring there is no fixed association of an application with a physical server. By using virtualization engines such as Ardence and VMWare, system software can be centralized and version control managed. This means OS failures can benefit from a central repository of OS versions enabling super-fast OS problem resolution by, for example, reverting to a previous version of an OS. By virtualizing the storage, data can be striped across multiple arrays so that no single disk failure will cause applications to crash. By separating storage from server, all recovery files such as journals which preserve the state of an application can be accessed by other servers, which minimizes the time to recover.
Action item: Configure blades with storage external to the servers and ensure the servers have no fixed association with applications. Focus blade virtualization projects on creating simple robust environments, not on saving processor cycles.
As users taste more Web services their affinity grows for applications that are simple, scalable, high performance and always available. This naturally increases the expectations for IT to 'raise the bar' for internal applications.
Blade computing can help hardware availability, but recovery problems remain. Commodity OS suppliers, particularly Microsoft, are still stuck in architectures where the processor and OS are the fundamental units of computing, and the application has the responsibility of recovering from failure. In the meantime organizations like Google have developed high availability infrastructures where loss of any component does not impact the applications supported – the infrastructure can automatically recreate state from other components. High availability is built in to the infrastructure, not added to the application.
Applications built as web services have very attractive cost and maintenance characteristics, both for applications built within an organization and external services such as Google applications premier edition. One of the challenges of architecting high availability systems for web service applications with commodity components is application recovery time. Hardware can fail over instantaneously, but recovery of state requires sophisticated software that is complex, takes significant time, and is often fragile.
Action item: IT organizations should be wary of promising high availability web services built on blades or any other commodity hardware and software. Where they exist, IT should first qualify and make available external services built on high availability infrastructures that are protected from component failure.
John Gage coined perhaps the best phrase in the history of the computer industry, "The Network is the Computer." Ironically, it appears as though Google, a server buyer, (not HP, IBM, Intel or Sun) is building the world's largest, fastest, most reliable and scalable computer. By using inexpensive servers and assuming components will fail, Google (with the Google File System) was led to an architecture that spreads operating systems, file systems, applications and data across entire server infrastructures worldwide ensuring lightning fast response times and always-on application availability. While the world's leading server vendors still provide architectures that, for the most part, presume a one-to-one relationship between server and application, Google is paving the way for the real growth opportunity in Web services by spreading everything, everywhere.
Action Item: Web service delivery is driving demand for new blade server architectures. Vendors must re-think traditional definitions of servers and make blade computing the underpinning of new approaches to architecting network-based systems where the presumption of frequent component failure and highly distributed computing resources are fundamental to designs.
While blade computing promises many benefits, eradicating the requirement for labor-intensive, software configuration in complex application environments is not one of them – despite supplier claims to the contrary. This is especially true in server consolidation projects. While blade computing can mitigate many hardware and hardware administration costs, the main challenge of any program to reduce server counts remains application, and not hardware, consolidation. For example, applications demanding extreme levels of processor, I/O, and storage support (e.g., very large transaction processing systems) will continue to be better served by very large symmetric processing complexes. In general, users should recognize if and when applications can reliably co-exist with each other, and implement consolidation plans around these considerations, not narrow concerns for server counts.
Action Item: Blade technologies do not change the constraints imposed by the challenges of merging applications in server consolidation efforts. Blade technologies are an excellent foundation in like-application environments, but do not mitigate the problems of operating extremely divergent application domains.