As big data implementations go mainstream, understanding their impact on data-center network design is becoming critical. Wikibon’s Stu Miniman recently covered the network design considerations within a data center, including how both Cisco and Arista approach the issue. As an extension to those set of considerations, the impact of Big Data on Data Center Interconnects (DCI) also warrants study.
Big Data drives Big Traffic
As projected by the Cisco Global Cloud Index, inter-data-center traffic is growing 34% CAGR to reach nearly 1 Zettabyte a year by 2015, and that DCI traffic is growing at a faster pace than intra-data center traffic over five years. Enterprises are responding to the increasing demand by building multi-10Gig inter-data center networks. For example, Johns Hopkins is building a 100Gbps network to connect multiple labs to move medical and research data around. National Lamda Rail is another example of big data needing big networks (in this case 100Gbps network built over 12,000 miles of fiber). The big data movement is clearly resulting in “big traffic” across data centers.
Big Traffic across WANs
The availability of cheap compute and storage and highly effective big data analytics capabilities are encouraging enterprises to collect, transmit, and store more data than ever before. This increase in raw data is resulting in higher DCI traffic, as more data needs to be moved back and forth between storage systems and in many cases replicated.
Most of the big data being generated is unstructured machine or sensor data and tends to be distributed across large geographies. Bringing together big data in centralized locations for efficient processing on Hadoop or other big data systems is another factor driving big traffic over the WAN. In addition to bringing source data for analysis, the results of MapReduce processing are sent to different data centers for integration with traditional data analytics tools for further processing, resulting in more WAN traffic. Finally, as Hadoop becomes mainstream and enterprises start using HDFS as a storage tier, replication of this data across clusters for disaster recovery will also result in higher traffic.
In a number of customer scenarios we have encountered, 20% or more of the inter-data center WAN bandwidth has been dedicated Hadoop related traffic.
In order for the inter-data center network to keep up with the growth of big traffic, a radically new approach to WAN optimization is required. WAN optimization solutions have been available for years; however the traditional solutions were designed for branch-to-data-center traffic. Branch traffic is typically user-to-machine, exemplified by applications such as e-mail, file shares, and Web traffic. Optimizing user applications is a compute-intensive problem and requires flexible, easy-to-develop solutions to keep up with the ever-changing application landscape. As a result, branch WAN optimization solutions are almost always developed in software on x86 platforms. Since most branch network speeds tend to be 10s to 100s of Mbps, and not Gbps, an x86-based solution is sufficient. Finally, user-to-machine traffic tends to be latency insensitive and tends not to be adversely impacted even if the WAN optimization system in use introduces 10’s of milliseconds of latency.
Characteristics of big traffic, however, are completely different. Inter-data center bandwidth requirements for big traffic are in the Gbps, with individual flows reaching 1 Gbps or higher speeds. Big traffic is almost entirely machine-to-machine and tends to be latency sensitive (even the passing mention of deploying systems in the path of high-speed replication traffic that may add 10ms port-to-port latency can be sufficient for your storage guy to disown you).
Big-traffic WAN optimization is a networking problem that requires dedicated hardware (merchant silicon for switching, L4 packet processing, etc.) to process packets at rates of 10s of Gbps while introducing microseconds of port-to-port latency. X86-based WAN optimization solutions simply cannot be tuned to meet big traffic needs. Just like 10G switches and routers, 10G (big traffic) WAN optimization devices must be built using silicon for data path instead of the traditional x86-based approach.
Big traffic is here and now for a number of enterprises. Network architects need to rethink design and vendor decisions to handle big traffic within and across data centers for successful deployments of big data systems.