Cisco’s Rip and Replace Dilemma

Current network topologies are inadequate to meet the flexibility and scalability demands of burgeoning virtualized data center environments.  New switches and new network architectures are emerging that transform the data center to Infrastructure 2.0 (comment or edit a vendor-independent definition of Infrastructure 2.0 on the wiki).  Users should be aware that moving to this new environment is a disruptive, rip and replace initiative that requires substantial planning.  Despite this caveat, a modernization process provides the opportunity to streamline current siloed infrastructure spanning network and servers in a virtualized context.

Participate in the upcoming Peer Incite call Beyond Spanning Tree Protocol on July 27, 2010, Noon EST.  It is a chance to learn more about one aspect of Infrastructure 2.0 technology, hear what others in the community think and get your questions answered.

Traditional Networks

While not identical, the changes required for network architectures in many ways parallel what we saw in the server space which led to the adoption of server virtualization.  Let’s look at two of the current network architectural practices that need to be changed for customers that want a data center designed for virtualization and cloud adoption:

  1. Oversubscription
  2. 3-tiered architecture

Oversubscription: Server virtualization helped take resources which were under-utilized and consolidate them into fewer devices that had a much higher-utilization.  Similarly, network architectures are under-utilized (oversubscribed) due to Spanning Tree Protocol (STP) which ensures a loop-free topology by disabling those links that are not part of the spanning tree, leaving a single active path between any two network nodes.  Since only a single link is active between nodes, the total bandwidth of the network can be significantly oversubscribed (see diagram on the right provided by Cisco for a “typical” environment).  Switch architectures were designed with limited bandwidth to support these oversubscribed configurations.

3-tiered architecture: Pre-virtualized server environments were siloed by application, similarly, traditional network architectures allow for physical and logical isolation of applications.  The predominate network architecture is to have an access layer, aggregation/distribution layer and core layer.  This architecture is designed for “north-south” traffic which means that data and services are mostly going from the access layer to the core, rather than between access layer devices which would be “east-west” traffic.  Three-tiered architectures initially became popular as a means of balancing network utilization, performance and flexibility. However, as was the case with servers, virtualization provides an opportunity to streamline the infrastructure.

Virtualized Networks

As organizations pursue server virtualization, the requirements for the network have changed since higher-utilized servers translate into higher network bandwidth needs.  Rather than the 3-tiered, oversubscribed solution, the new generation of switches that are being built for virtualized environments are non-blocking (i.e. bandwidth to support fully utilized ports) and being deployed into a flatter network architecture (technically 1-2 tiers depending on vendor).  Mobility of applications between servers (such as with VMotion) is an activity which requires high bandwidth “east-west”, which is not easily handled by traditional 3-tier solutions.  Spanning Tree Protocol is being supplemented or completely replaced with technologies that allow for multipathing and redundancy at layer 2.  The flatter architecture allows for a more flexible flow of traffic both “north-south” and “east-west” (see diagram on the right provided by Cisco for a 2-tiered, non-blocking environment).  The switching environment becomes virtualized both through managing multiple switches as a single pool and by blurring the boundaries between the physical switching infrastructure and the virtual switching environment that is part of hypervisors.  Unlike virtualization in servers and storage, the transition to network virtualization is a replacement rather than an extension of existing infrastructure.

Last week, Cisco unveiled the products and vision for a 2-tiered architecture.  Their solution is FabricPath, which is software and the first instantiation of the product is on the new F-series module for the Nexus 7000 series switch.   From a standards perspective, Cisco is calling FabricPath a “superset” of TRILL.  TRILL (Transparent Interconnection of Lots of Links) is an IETF standard which when ratified (expected soon), provides an option to replace STP.  Leading the TRILL standard is Radia Perlman who created STP and brings increased credibility to TRILL.  While Cisco is participating in the standards effort and says that the Nexus and F-series will support TRILL, in typical Cisco fashion, the company is blazing its own trail, trying to create a de facto advantage in the marketplace.  In fairness, other leading vendors are not on the TRILL bandwagon, however Cisco has wrapped itself in the standards flag as a marketing tactic. The reality is that Cisco is trying to catch up with Juniper and HP (3Com) which already have innovative flatter architectures designed for virtual environments.

Transitioning Your Network

CIOs today are looking to get more out of their existing resources, but find themselves with 3-tiered architectures built with older switch technologies (like Cisco’s flagship Catalyst switches) that can not meet the flexibility and scale that they want in a virtualized data center.  Updating the network is an expensive and disruptive process with many transitions including:

  1. New hardware – not just adding 10GbE (typically requiring new cabling also), but moving to non-blocking switches
  2. New architecture – from 3-tier to 2-tier
  3. New processes – not only all of the new management, but organizations should look at reporting structures and team interactions. Specifically, a non-siloed virtual network is going to be better served by a less siloed network, server and storage teams (especially when considering network convergence such as with FCoE).

Cisco’s newer Nexus family are non-blocking switches, but even if companies have made the investment in Nexus, they will need to spend more money and re-architect their environments if they add FabricPath when it is available in Q3 ’10.  Both Juniper – with its Project Stratus and “3-2-1″ data center architecture – and HP (3Com) – with its Intelligent Resilient Framework (IRF) – have alternatives that companies should evaluate when making the move to a 10GbE next generation of switches and architecture. Cisco, HP and Juniper have dramatically different approaches to architecting nextgen networks. In short, Cisco wants to maintain its substantial lock-in advantage, HP wants to bomb pricing and Juniper wants to disrupt everything so it can steal share. Customers should understand that no matter which path they choose for virtualizing networks, they must plan for disruption and look toward developer-friendly, multi-vendor, best-of-breed solutions to minimize lock-in.

Customers considering updating their systems need to balance their asset management cycles (installed base), technology cycles (adopting innovations such as Nexus or non-Cisco) and business cycles (especially capital budgets).  As such, it is recommended that customers pilot the new configurations to determine the impact of new architectures on their stack and on change control and management practices.

Conclusions

All vendors are racing to bring virtualized networks to market and migration to the new paradigm is inevitable.  Cisco, HP (3com) and Juniper all have solutions in various stages of readiness. Cisco is the big dog and has the most to lose in a transition phase. Network managers should avoid expensive extensions to existing networks and implement trials of the new paradigm.  They should make the planning assumption that the business case for migration to a 2-tier architecture will become overwhelming in the next two to three years. Rip and replace decisions are not easy choices for customers and expose Cisco in particular to transitional risks.  These are three key developments that observers should watch for indicators of success in the coming transition:

  1. Cisco’s ability to affect rapid adoption of converged network, compute and storage architectures with UCS and FCoE
  2. HP’s skill at delivering a truly converged offering of networks, servers and storage to become the clear number two player
  3. Juniper’s capacity to leverage an ecosystem of cloud service providers to reach enterprise customers
Share

, , , , ,

  • brandontek

    Pretty good info. Will retweet…

  • Pingback: Tweets that mention Cisco’s Rip and Replace Dilemma « Wikibon Blog -- Topsy.com

  • http://blogstu.wordpress.com stu

    Thanks Brandon – hope you'll join us on the 27th for the Peer Incite call.

  • OmarSultan

    Stu:

    So you bring up some of the interesting dynamics driving data center network design, although I think you make a number of assumptions that I would not agree with–I am sure you are surprised. :)

    So, let's start with a bit of context. With the typical enterprise customer still having ~20% of their production workloads being virtualized and only select applications running in a federated environment, their is no impending doom event for most folks to rush out and rip out anything on a wholesale basis. For many of our customers, the need for higher bandwidth to support higher VM densities can initially be met by moving from GbE to 10GbE and by using vPC with eliminates STP and doubles bandwidth to 20Gb by making both uplinks active (other vendors offer similar approaches).

    For customers that need higher bi-sectional bandwidth driven by high volumes of east-west traffic because of things like vMotion, inter-server app traffic or super-high VM-density, FabicPath (and TRILL down the road) makes a lot of sense. However, our expectation is customers will initially deploy this selectively and the typical data center will be a mix of networking approaches. The key for our Nexus customers is the ability to transition between the two approaches (2-tier and 3-tier) in a granular manner by adding F-series I/O modules to their N7Ks. So, customers get the flexibility they need in a single chassis (L3, L2, 3-tier support and 2-tier support) along with the ability to move between them.

    At the end of the day, is there some discontinuity between a GbE-centric, 3-tier, GbE, N-S optimized data center and a 10GbE-centric, virtualization-optimzed data center? Sure. But the thing to remember is that customers control the pace of change and thus also control the level of disruption they actually create. I agree that rip-and-replace is seldom desirable. That being said, its curious that you hang that around Cisco's neck, since we arguably have the best record of investment protection. The Catalyst has been shipping for over 10 years and is not going away anytime soon. Our stated position on migration is continue to leverage your Catalyst investment and migrate to Nexus when you see a compelling reason to do so (i.e. 10GbE density, OTV, etc). The benefit is that customers continue to maintain consistent features and managabilty as they make the transition.

    From a standards perspective, the data plane portion of TRILL is locked down (we don't expect that to change) and the F-series supports it. Once the control plane portion of TRILL is nailed down, we can turn up TRILL support. In the interim, we have a solution for our customers to deploy. Once TRILL is actually available, they can choose between TRILL and FabricPath–don't really seem much downside to that for customers.

    Finally, the comment about playing “catch-up” is kinda curious. If you are talking about product, we will have CY Q3 availability, so I am not sure how that is playing catch-up. I know you know about R&D cycles, so you know we have been working on this for some time. If you are talking about broader vision, then we have been talking about virtualized data centers since 2007 and shipping converged DC fabric solutions since 2008, which is well before anyone else and before some folks even decided to enter (or re-enter) the market.

    Regards,
    Omar Sultan
    Cisco

  • http://blogstu.wordpress.com stu

    Omar,
    I know that Cisco would always like to “Do Both”, but many customers don't have the space, power or budget to carefully balance between this old and new world as you describe. Catalyst may not be going away, but it is obvious that if customers want the latest features and functionality that it is a migration to the Nexus product line. I hang the disruption on Cisco because of the impact that changing the huge install base will have. On messaging, I've heard a lot from Cisco on adjacent markets and convergence with compute and storage and marketing around the high level trends of virtualization and cloud. I've seen your competition discussing architectures and delivering product that flatten the network prior to last week. What I'd like to see from Cisco is more detailed roadmaps that explain what I can really do today, the milestones that march towards the vision and how this all ties together with standards.
    Thanks for the dialogue,
    Stu

  • Pingback: SiliconANGLE — Blog — Cisco Might Have Big Issues - Rip & Replace Verses Incremental Improvement

  • OmarSultan

    Stu:

    I am not suggesting that customers run parallel networks, but I am suggesting that customers can migrate from Catalyst to Nexus at a pace they control–it is important that we have a continued commitment to Catalyst so customer retain that control, as opposed to announcing EoL on the Catalyst to force migration to the new platform as other folks have been known to do. The reality is customers can deploy Nexus in a tactical feature-driven fashion where this is immediate need and then slowly transition the rest of the infrastructure as part of the normal refresh cycle. Our customers can do this because of consistency of config and features (i.e. Netflow) across the families. To that end, I'd argue that moving to Nexus is the least disruptive option of those available to customers. Yes, we have a large installed Cisco base, but we are not the ones who are looking to do the ripping and replacing. :)

    As for roadmaps, etc and the like, perhaps a call would serve us both better–too much typing otherwise. :) Ping me and we can set something up to chat.

    Regards,

    Omar Sultan
    Cisco

  • http://www.google.com/profiles/jeremyarnold Jeremy Arnold

    Nice work Stu!

    HP understand that users want to sweat there assets. The FlexFabric story is worth listening to, and HP recognise that customers shouldnt have to have to implement a FlexFabric network all at once. Customers should see benefits immediately with incremental adoption of server edge solutions and can gradually add additional capabilities as legacy technology investments mature and in keeping with the desired rate of migration.

    HP FlexFabric will deliver true “networking as a service” capability to the various consumers of connectivity within the data center. It will provide a unified infrastructure across servers, storage, and networking that can dynamically adapt to the business demands of a highly virtualized operation – without a forklift upgrade, and by adapting to existing customer workflows

    Jeremy Arnold ~ Solution Architect HP Networking UK&I
    @JezAtHP

  • Pingback: Storage News Wave | ESXGuru's Diary

  • http://etherealmind.com Etherealmind

    The challenge with the thrust of your argument is that you assume that 'flat networks' will scale and operate consistently. Although some very good people are developing L2 segmentation technologies, such as TRILL, there are no guarantees that this fabric revolution will be successful.

    Old timers bitterly remember the catastrophic failures of bridged networks in vendors previous attempts at flat networks, and these technologies remain unproven. A little less optimism would certainly be in order.

  • http://etherealmind.com Etherealmind

    The challenge with the thrust of your argument is that you assume that 'flat networks' will scale and operate consistently. Although some very good people are developing L2 segmentation technologies, such as TRILL, there are no guarantees that this fabric revolution will be successful.

    Old timers bitterly remember the catastrophic failures of bridged networks in vendors previous attempts at flat networks, and these technologies remain unproven. A little less optimism would certainly be in order.