If you are even a semi-regular user of Twitter you no doubt have encountered the Fail Whale at some point. With 300,000 new accounts created every day and 600 tweets per second (or TPS in Twitter parlance), the need for robust and optimized infrastructure is critical for Twitter. Thus, it is no surprise that the company is building its own data center later this year.
That news had the Wikibon community thinking:
What other types of data center technology are powering today’s high profile social media websites?
Here is a look at six of today’s popular social media and social networking companies and ways they are leveraging data center and data storage technology in innovative and energy efficient ways.
As discussed via Techcrunch, Twitter has been using data centers managed by NTT America in the Bay Area. With a new, fully owned data center, “Twitter will have full control over network and systems configuration, with a much larger footprint in a building designed specifically around our unique power and cooling needs.” According to the the Twitter Engineering Room Blog, the company is moving its technical operations infrastructure to a new facility in Utah.
The blog post continues and indicates this first Twitter managed data center will be designed with a multi-homed network solution, a technique which increases the reliability of Internet connections in an IP network. Perhaps more importantly, plans are in place to bring additional Twitter managed data centers online over the next 24 months.
In January 2010, Facebook officially “broke ground” with its first custom data center in Prineville, Oregon. Some of the infrastructure technologies Facebook has indicated it isleveraging include an evaporative cooling system, which evaporates water to cool the incoming air, ultimately minimizing water consumption and Proprietary Uninterruptible Power Supply (UPS) technology, reducing electricity usage by as much as 12 percent.
Artist’s rendering of planned Prineville, Ore., data center. (via Facebook)
Like it or not, Google is aggressively growing in the social media space, as initiatives like Google Wave, Google Buzz, and further social collaboration with Gmail and Google Apps play a larger role in their business.
As documented in Google’s 2009 Energy Data Center Summit, the company’s data centers have reduced overhead energy use by more than 80% when compared to typical facilities, primarily through the rigorous application of data center best practices, a machine-level UPS, and an ongoing process to improve energy efficiency performance.
Google also has paid particular attention to e-waste: 100% of the servers retired are either reused or recycled, not incinerated or dumped in landfills.
MySpace applied SSD and Flash technology as a replacement for servers that acted as RAM cache for data intensive applications. By making the change, MySpace reduced its server requirements significantlyby 4-1, 8-1 or even 10-1, depending on the application.
MySpace has three data centers located in Los Angeles, CA, Chandler, AZ and Ashburn, VA, with a fourth being built in Las Vegas, NV. These sites contain more than 10,000 servers to deliver MySpace services.
While MySpace had already replaced 25 percent of its RAM cache farm with SSD/Flash storage technology, the company has acknowledged that this approach does not make sense for everyone. There are some instances where the CPU can be more of a bottleneck than the storage system; as well, the technology is often prohibitively expensive and hence very use case dependent.
In late 2008/early 2009, growing pains at the popular social news site necessitated the company’s process of moving Digg to a new, larger datacenter space, with more room for expansion. The objectives of the move included providing ample power and cooling, expansion opportunities and a skilled, professional support staff.
According to the Digg Blog Post, the company’s new facility has room, power and cooling for 40 cabinets worth of servers, with easy expansion to twice that number if needed. Digg upgraded server hardware to dual, quad-core Intel-based systems, that use marginally more energy but provide twice the computing power over the existing configuration.
While not much can be discovered specific to LinkedIn’s data center technology, the engineering team at LinkedIn put together a series of blog posts highlighting the technology used for scaling simple storage within the company.
With performance and reliability in mind, LinkedIn’s “Project Voldemort” is designed “to scale both the amount of data we can store and the number of requests for that data.” The Engineering team put together a presentation on how LinkedIn stores its data which is available below and on Slideshare with notes and transcript.
Final Thoughts on Data Center Technology
On balance these examples underscore three trends:
- Large Web service providers increasingly want to control their own destiny to drive operational excellence, lower costs and better user experiences.
- Power and cooling is a major consideration in data center designs today bringing back memories of mainframe-like mentality with regard to facilities planning.
- Accommodating growth is a major factor for these firms as the risk of constricting growth is more threatening than the expense of building in scalability.
Increasingly, large data centers at firms such as these will set the standard for data center design and operational excellence. Their application of infrastructure and software technologies is on the cutting edge and are key enablers to growth, innovation and monetization strategies. At the core, these firms are essentially IT shops.
Geez – maybe IT does matter after all…