Storage Peer Incite: Notes from Wikibon’s August 7, 2012 Research Meeting
Recorded audio from the Peer Incite:
The set-in-stone Spanning Tree Protocol has made it difficult for IT to provide agile response to rapid changes of Ethernet environments. In business needs, particularly when those require network upgrades as simple as adding one new switch. Today, however, new technologies are changing this picture. The advent of new architectures paired with 10Gb Ethernet, allow leading-edge companies and service providers to build a single physical network that simplifies the entire architecture. One opportunity is collapsing the siloed SAN and LAN, allowing one group to manage both, using one set of management tools and network technologies from one vendor. The SAN may remain a separate logical network, but it can now be built physically with the same switches as the LAN. And maturing TRILL (TRansparent Interconnect of Lots of Links) solutions offer one option to replace STP with a highly adaptive, flexible alternative that makes adding a switch as simple as plugging it in.
The August 7 Peer Incite gave the Wikibon community the chance to hear from a true pioneer in network implementation, Luke Norris, CTO of IaaS provider Peak Colo. This cloud service provider is experiencing 300% annual business growth. To support that, it has moved to a TRILL-like technology based on 10 Gb Brocade switches, replacing both its 1 Gb Ethernet and FC SAN. The result is that it has hugely simplified network management, cut its network costs by 50% and created a much more flexible architecture. Most important, it has improved its key service level measure -- IO response time. It also has allowed Peak Colo to eliminate the traditional organizational silos of large IT departments and build a service-oriented organization organized around the business services it provides to its customers.
Important aspects of his message are examined in the articles below, and links to the the audio and video recordings of the hour meeting are also provided.
Bert Latamore, Editor
On August 7, 2012, the Wikibon community gathered to discuss network scalability from the perspective of a high-growth service provider. Luke Norris, CEO of Infrastructure-as-a-Service (IaaS) provider Peak Colo, shared how the adoption of a converged network helps his company cope with 300% annual traffic growth.
A Converged Network
Peak Colo is on its fifth generation of infrastructure; over the last year, it has moved from a heterogeneous network environment that was made up of 1Gb Ethernet switches from multiple vendors and Fibre Channel (FC) to a single network supplier (Brocade) with 10Gb Ethernet. Peak Colo has eliminated its FC environment completely and rather than moving to the incremental change of FCoE, it has moved to an environment that is 90% NFS and 10% iSCSI. Luke stated that NFS fits well with the VMware environment (Peak Colo is a VMware Premier Solutions Provider) that it runs on NetApp storage.
Peak Colo claims that it does not have the infrastructure silos that have slowed enterprise adoption of converged environments. The company has requirements of resiliency and scale, which the active-active 10Gb Ethernet environment delivers. Peak Colo found that Brocade delivered the enterprise FC feature set in the VDX Ethernet Fabric. IT is focused on the application requirements, rather than silos. The typical three-to-five year infrastructure refresh puts applications in sub-optimal environments; IaaS providers offer a way to consistently stay on a more modern architecture.
Modern Network Architecture
The upgrade of the networking for Peak Colo was about much more than an increase in bandwidth. In the past, upgrades or additions/removals of switches was a complicated and labor-intensive task; with the deployment of a fabric-based solution, these tasks became much simpler to perform. With Brocade’s subscription service, Peak Colo gained granular control of costs down to the port level, not just the switch or card. Luke spoke highly of how the elimination of Spanning Tree Protocol (STP) in favor of a TRILL-based solution increased the utility and agility of the network (see the integration action article for more detail). The deployment of Brocade VDX delivered a 50%-60% reduction in networking cost for Peak Colo, and these savings have been passed on to customers. The network provides segmentation, allowing the management of a unique SAN for each customer; this is done even easier on Ethernet than FC. The convergence to a single, more efficient network has led to denser and more power-efficient data centers. Peak Colo puts between 9-18kW of power in a single rack, which can include 240 compute nodes, and up to four 60-port switches. In addition to lower costs and agile deployments, Peak Colo is also able to adopt new generations of technology and new features rapidly due to the technology and financial services received from Brocade’s offerings.
The Opportunity of Low Latency Service Providers
While early adopters of IaaS tended to be small to mid-sized companies deploying test and development environments, over the last two years larger companies with an increasingly diverse application portfolio are looking to IaaS and cloud. Peak Colo delivers exclusively through VARs who sell to CIOs and CFOs who are looking for the agility and economics that IaaS can offer. Peak Colo has discrete cloud nodes in major geographies, allowing it to decrease service delivery latencies. In addition, flash technologies are critical for delivering fast response times. Peak Colo’s Intel white box compute offerings all have SSD and for applications that require extra performance also utilize Fusion-io cards. The NetApp storage arrays are deployed with flash cache, and the new flash pools (delivered in OnTAP 8.1.1) are being tested. From an application standpoint, Peak Colo sees a lot of Web-centric workloads, an increasing number of Tier 1 applications and is starting to see users requesting IaaS to support Big Data.
Action item: Virtualization requires that CIOs rethink overall architectures. Fabric architectures are designed to provide agility and flexibility to the networking team while providing better economics for both acquisition and operations. IT organizations that can overcome thinking in silos will be more suited to maintain a competitive advantage and have infrastructure that can support modern applications. Companies should also use the transition of networking as an opportunity to evaluate applications that may be suited for moving to IaaS.
For Wikibon’s August 7, 2012 Peer Incite Research Meeting, we were joined by Peak Colo’s Luke Norris, who lead an excellent discussion about how modern networks need to be architected and offered some insights into how the roles of the IT department and the CIO might be affected as organizations move to embrace cloud services.
A refreshing change of pace
Personally, I found our conversation refreshing. While a believer in cloud services, Mr. Norris understands that naturally risk-adverse IT organizations will be loathe to simply supplant their working infrastructures in favor of outsourcing the whole environment to a third party. In other words, cloud adoption can be conducted in steps that make sense and in evolutionary ways and that will build confidence in the ability of the cloud to deliver.
Too often, we hear about providers wanting to basically take the place of the existing IT staff. This kind of talk makes people nervous and defensive and is a part of the reason that "cloud" gets a bad rap in many IT shops.
It’s the latency, stupid!
I’ve long believed that bandwidth is the Achilles Heel for many cloud services particularly for users in the many remote areas that cannot get the bandwidth services they need. However, a short discussion with Mr. Norris has changed my thinking in a couple of ways. He showed me that it’s not necessarily sheer bandwidth that might prevent organizations from jumping on cloud. The real issue is latency. After all, even remote areas can get all the bandwidth they need if they’re willing to pay the right price, but if they are too remote, latency issues will negatively impact any procured service.
As more providers spring up in more areas, the latency issues will eventually dissipate, but that is one key factor that today’s CIOs must consider when looking at any cloud services provider.
SLAs as a part of the admin skill set
Even a minor move into the cloud will require that CIOs and IT staff members make adjustments to accommodate the shift. Just as companies began to hire people with client/server skill sets in the waning days of the mainframe, companies will need to acquire skill sets that are necessary to embrace third-party provided cloud services and to integrate those products into existing on-premises services.
First, CIOs and IT staffers with data center responsibilities will need to hone their contract management and monitoring chops as the organization begins to acquire services that are key to business outcomes. I feel that those with the technical knowledge of the underlying services are natural fits for this contract management responsibility, at least in the early stages, as they will have the most success in ensuring that vendors adhere to the terms of their contracts and to ensure that the organization is actually receiving the services that it needs. What this creates is an opportunity for expansion for existing IT staff members rather than an opportunity for downsizing on the part of the organization. These IT staff members’ skill sets will play a critical role in the success or failure of cloud-based initiatives.
These admins-turned-contract-admins will need to learn that 100% SLAs are impossible and learn to negotiate SLAs that are realistic for both the company and the cloud provider. It’s easy to look at a cloud provider as an enemy to be overcome, but when both sides come together with a partnership-driven agreement that addresses the needs and challenges of both parties, wonderful outcomes can ensue.
Over time, IT staffers with data center responsibilities will also need to become integration specialists, helping the organization leverage the cloud services it buys to achieve maximum benefit.
I like to think of these integration efforts as “tightly loose.” They must be deep and broad and look like extensions of the enterprise architecture, but they need to be abstracted enough so that the company can easily shift to a new IaaS provider if the need arises.
Building such integrations also requires people with broad and deep technical skills.
The governance issue
CIOs will also need to consider the effect that cloud services may have on IT governance discussions, particularly when it comes to decisions that involve cloud providers. It would be easy to see cloud as a panacea that means that anything and everything gets done right now. CIOs need to get in front of the governance issue by making it clear that governance structures still need to perform their traditional roles. Embracing the cloud is simply one aspect of what IT will do to meet needs that involve IT governance, but it doesn’t mean that governance groups dictate such moves.
Streamline the business in baby steps and keep risk managers happy
One aspect that I particularly like about Peak Colo’s solution is that its service can become a simple logical extension of a customer’s existing data center environment. To me, this alleviates many concerns. First, CIOs purchasing such as service can do so in well-defined steps that should reduce the natural angst that will arise as IT jobs change.
From a risk management perspective, this is big. It allows “toe dipping” into the cloud waters without having to take the whole plunge all at once. It also demonstrates that the IT group retains control of the solution; control is not simply being handed over to some vendor. In fact, such services can be easily considered an enablement. They can enable IT to get things up and running more quickly than was possible in the past, and they can do it on their terms.
Loss-of-control issues become moot when cloud initiatives are approached in smaller steps and in clear and methodical ways because IT is clearly in the driver’s seat, watching out for the best interests of the company in ways that make the most business and economic sense.
Action item: CIOs need to begin now by laying the groundwork for such services. First and foremost, begin working with IT staff on contract management skills and, if possible, begin to talk about the cloud as a natural progression and make sure that IT staff members clearly understand their role in a changing IT landscape. Keeping IT staff focused on the business outcomes of their work and helping them understand that the change does not mean the end is key to a successful embracing of new services.
At the August 7, 2012 Peer Incite Research Meeting, Peak Colo’s Luke Norris laid out a blueprint for architecting modern networks. Essentially his organization had three main parameters in mind for the network architecture:
- High Availability
His strategy for achieving these results involved converging networks, virtualizing infrastructure, moving off spanning tree protocol (STP), and creating shared caching and solid state services.
Comparing legacy and modern networks, the former had separate infrastructure for voice (e.g. CAT 5), data networks (e.g. CAT 6) and storage (e.g. FC). Peak converged its networks onto Ethernet using NFS and iSCSI for the storage protocols such that all “data-ized” workloads run on a single converged data network.
Moving Off Spanning Tree
With legacy networks, organizations might have an FC SAN or older 1 gig Ethernet switches. This infrastructure used STP, which essentially builds a static map that dictates the data path. The data path is basically fixed causing bottlenecks and IO contention. This legacy approach is highly inefficient. For example, Norris indicated that older networks were over-provisioned at an 8:1 factor (to brute force around bottlenecks).
Moreover, to add a new switch in a legacy environment, the Spanning Tree map had to be re-drawn, which took time. Also, for many networks, adding capacity meant taking a planned outage – which could be avoided with engineering gymnastics, but was still onerous.
Peak moved to a TRILL fabric for its networks to resolve this problem. Peak actually uses a Brocade Virtual Cluster Switching (VCS) solution, which is TRILL-like. TRILL (Transparent Interconnect of Lots of Links) replaces spanning tree and enables fabric technology to be implemented in a mesh system that allows multiple paths between switches. This approach simplifies the network infrastructure, designs in scale and makes adding nodes non-disruptive. As such high availability networks are much easier to deploy.
Peak has also deployed flash virtually everywhere, in the server with SSDs and with Fusion-io cards and in the storage array using NetApp Flash Cache (and potentially Flash Pools in the future).
The result of this integration is a simplified infrastructure that is much more flexible and cost effective. As a proof point, the network switches today at Peak are overprovisioned only by a factor of 2:1 versus as much as 8:1 in legacy systems. Norris believes this infrastructure will carry the service provider into the next generation of cloud applications and support a business model that designs in the flexibility on which customers can build more robust solutions.
Action item: New architectural thinking is required to design modern networks for scale. Specifically, CTOs should understand their technical objectives and design networks with low latency in mind and reduced bottlenecks. A key enabler of this objective is moving off the Spanning Tree protocol to a TRILL- or mesh-based system that enables any-to-any connectivity and more agile change management.
On August 7, 2012, Luke Norris, CEO of Peak Colo, joined the Wikibon community to discuss how the company has designed and implemented a network for service-provider scalability. Peak Colo is a provider of Infrastructure-as-a-Service (IaaS) and collocation services, and, as such, is also a huge consumer of IT hardware and software.
When operating at this scale, Peak Colo has the liberty of quickly adopting the latest advances in technology. For instance, unlike most private data centers Peak Colo has eliminated spinning disks in all servers, instead choosing solid state drives. Peak Colo has also completely eliminated Fibre Channel SANs, opting recently for a converged server/storage network leveraging a Brocade® VCS™ 10Gb Ethernet Fabric.
In order to drive maximum operational efficiency while supporting fine-grained deployments, rapid scalability, and high availability, Peak Colo implemented a fully virtualized infrastructure for servers, storage, and networking. In addition to making it more responsive to new and existing customer demand, this approach helps the company minimize hardware, software, staffing, and training expense. Norris believes most IAAS providers will move in this direction.
When asked what he would like to see suppliers do better, Norris made an important point: Unlike IaaS suppliers, operators of private data centers are taking a more measured approach to virtualization. At the same time, many will begin to migrate workloads to IaaS providers, as evidenced by the rapid growth of Peak Colo. The challenge for IaaS providers comes when customers want to span workloads across private data centers and public cloud platforms. This is most challenging when the private data center is not fully virtualized. Norris says virtualized IT components, such as servers, networking, primary storage, security, firewalls, backup, and archive, generally interoperate well, if all components are designed as virtualized resources. The same can not be said, when these resources must operate across both physical, dedicated components within the private data center and virtualized components from the IAAS provider.
Action item: IaaS providers need partners who can help them bridge between physical environments at their customers’ private data centers and the fully-virtualized environments at the public cloud data centers. IT suppliers looking to differentiate their offerings to IaaS providers, should take the lead in this development.
When IaaS vendor Peak Colo upgraded its data network to the Brocade VCS 10 Gb Ethernet Fabric in part to eliminate the limitations of spanning tree in favor of the TRILL network architecture, it triggered a reorganization of its entire IT department, says CEO Luke Norris. The upgrade replaced not just the company's old Ethernet but its Brocade Fiber Channel SAN as well, creating a single unified computing environment that used one switch family, one management suite, and one network skill set.
This simplified operations from purchasing through operations. The real impact, however, was the elimination of storage networking as a separate function from the rest of networking. Peak Colo was able to reorganize its IT department and operations to focus on services and the applications it runs for its individual clients. IT silo performance measurements such as network throughput were minimized in favor of the one measure important to its customers – IO response time.
The traditional operational silos were broken down in favor of multi-skilled teams headed by senior engineers who understand the infrastructure as a whole. The specialists are still there, but no one acts without being aware of the impact on service delivery. Even traditional three-and five-year hardware upgrade schedules have been scrapped in favor of upgrading to meet the anticipated needs of applications and their users.
Action item: The era of cloud services requires that internal IT adapt a service orientation. That involves much more than just listing IT services on a menu. IT must break down the traditional silos to create multifunctional teams focused on the needs of each service and business users and in particular deliver the right IO response rates combined with adequate security and infrastructure resiliency. As Norris says, “The day when a SAN tech can operate without regard for the entire infrastructure are gone.”
The argument for a single network has always been a simple one – if there is only one network (or at least one type of network), the sparing, training, and knowledge set becomes much simpler. While there are strong historical reasons for deploying separate networks (including FC for storage and InfiniBand for high performance clusters), today organizational boundaries (OSI level 8 – politics) rather than network limitations are increasingly likely to be the reason that companies do not adopt convergence.
IaaS provider Peak Colo undertook an 18-month transition from a mix of FC and 1Gb Ethernet to a completely 10Gb Ethernet environment utilizing NAS (90%) and iSCSI (10%). CEO Luke Norris stated that Peak Colo’s infrastructure teams are focused on application requirements rather than silo preservation. Users can choose from multiple paths toward convergence. 10Gb Ethernet deployments of NFS, iSCSI, or FCoE can all provide high performance for a broad spectrum of applications. Peak Colo found that Brocade’s Ethernet solutions provided the reliability and many of the same features and management capabilities of a FC SAN. While logical divisions of the physical network may continue, there is no need to maintain separate physical networks. For customers that need a more gradual path, many adapters and even some switches deliver FC and Ethernet as a Single Solution Set.
Virtual environments are best when paired with a homogenous converged network, where the mobility of workloads should not be slowed or limited by interfaces between different networking technologies in the physical environment. Companies that get rid of siloed networks will save on infrastructure costs while improving time to deployment, availability, and performance. Peak Colo’s adoption of a single network is also helping to move it towards managing the environment with a single pane of glass. It typically uses VMware vCloud Director and is working on a proprietary orchestration solution.
Action item: IT organizations need to be driven by applications and requirements rather than silos. Care must be taken in readjusting architectures and organizations to focus on the real value of IT. CIOs will need to determine the pace of change that makes sense for their companies, but those who fail to move rapidly will be left with uncompetitive economics of infrastructure and workforce.