Architecting a Network for Hadoop

One application that impacts the design of Data Center Ethernet Fabrics is Big Data. Hadoop runs on a shared-nothing architecture, defined as a collection of independent, possibly virtual, machines, each with local disk and local main memory, connected together on a high-speed network. This means that storage is DAS, not SAN (even from EMC’s Greenplum solutions – as discussed towards the end of this video). Even with storage out of the mix, there are special networking architectural considerations for big data environments like Hadoop. The Wikibon and SiliconAngle teams had full coverage of Hadoop World 2011, including discussions of networking with Cisco and Arista Networks.


Understanding Big Data’s Impact

Jacob Rapp of Cisco discussed that architecting for Hadoop requires an understanding of the data models that customers will be using. Cisco has published a white paper that provides a benchmark for a 128-node cluster. It was determined that availability and resiliency are the top consideration when building a Hadoop environment since recovery of an HDFS environment would ripple through a configuration. Buffering is important to handle the bursty nature of certain big data workloads; Cisco’s recently released Nexus 3048 (low latency top-of-rack switch) and Nexus 2248TP-E (a Fabric Extender/FEX) both offer increased buffer size. Like the rest of Cisco’s Nexus line, these switches that are optimized for bursty environments fit into the NX-OS family that have a single management toolset. Here is the full interview with Jacob:


Networking for HDFS

Arista Network’s Doug Gourlay spoke about the changing landscape of networking for both big data and cloud environments on theCube (full interview here). While virtualization environments require “fat and flat” layer 2 environments, HDFS is layer 3-aware, in other words, it is a routable, so the network must fit this requirements. Doug compared the architectural requirements to the Internet; it’s about putting the routers in the proper place. Many layer 2 switches also support layer 3 functionality (note that Cisco’s Nexus 2248TP-E would be paired with a Nexus 7k or Nexus 5548/5596 for layer 3, the 5k’s require a daughtercard for L3). Here is the portion of Doug’s interview discussing Hadoop and networking:



Doug’s position is that moving to a new architecture is best done first in a niche environment where it can be tested, iterated upon and then scaled out. While Cisco’s strength is in providing a common management across infrastructure for various applications, Arista has been pushing into environments like HPC, Web 2.0 and Big Data.


Scale and Speed

While Hadoop environments support a large amount of data, the size of configurations today is much smaller than cloud service providers or even sizable enterprises. One of the defining characteristics of big data projects is that they are typically deployed much faster than traditional data warehouse environments. The Hadoop community and solution set is growing fast, users should be diligent to choose appropriate vendors and systems integrators that can help determine the proper infrastructure requirements for the applications that will be deployed.

, , , ,