At the start of 2011, there was only one commercial Hadoop distribution vendor on the market, virtually no Big Data application vendors with products ready for primetime, and Data Scientists were considered little more than propeller-heads working on some wacky experiments.
Well, a lot can happen in a year. As of December 2011, there are three viable commercial Hadoop distribution vendors doing battle for market supremacy, a slew of start-ups as well as stalwart software vendors getting into the Big Data application game, and Data Scientists are the new rock stars of the IT world.
I know, it’s a lot to digest. That’s is why I’ve pulled together the top five Big Data stories of 2011 as covered by Wikibon analysts and our good friends at SiliconANGLE. Following a summary of the news, find links to related Wikibon research, SiliconANGLE posts and video from theCUBE.
So take a trip down Big Data memory lane, and keep and eye out for Wikibon’s Top Five Big Data Predictions for 2012 coming soon. Next year promises to be even more exciting.
1. Hadoop distribution war rages. Speaking on theCUBE at last spring’s Strata conference, Cloudera CTO Amr Awadallah said his company had no competition in the commercial Hadoop distribution market. And he was right … at the time. But in May, MapR emerged from stealth mode with its own NFS-powered Hadoop distribution and announced a licensing agreement with EMC. Then in June, Yahoo! spun-out its internal Hadoop engineering unit to form Hortonworks. Since then, the three vendors have been battling it out, each touting the advantages of its business model over that of the others. While market leader Cloudera has a two-year head start on its two rivals and significant momentum with over 100 paying customers, MapR and Hortonworks are coming at it hard from both directions.
- The Hadoop Wars: Cloudera and Hortonworks’ Deathmatch for Mindshare
- MapR Hadoop Strategy Stresses Performance, Availability, API Compatibility over Open Source Code
- EMC Elbows Its Way Into Apache Hadoop with Greenplum HD Appliance
- Cloudera Sizes up Hadoop Competitors EMC Greenplum and Yahoo Hortonworks
- Ease-of-Use, Messaging Will Determine Hadoop Winners and Losers
- How Many Proprietary, Value-Add Components Will the Hadoop Community Accept?
- The Stakes Are High in the Hadoop Distribution Race
(Cloudera’s CTO Amr Awadallah on Hadoop distribution competition)
2. Big Money for Big Data. Despite a moribund economy, venture capitalists bet big and bet often on Hadoop and Big Data start-ups this year. In addition to the Big 3 Hadoop distribution vendors, each of whom raised $20 million or more in 2011, the recipients of VC’s largess included Hadapt, which raised $9.5 million in October led by Northwest Venture Partners and Bessemer Venture Partners; Datameer, which tapped Kleiner Perkins Caufield & Byers for $9.25 million in May; and Datastax, which added $11 million to its coffers thanks to Crosslink Capital in September.
Perhaps Accel Partners made the biggest splash, however, announcing at Hadoop World the establishment of the $100 million Big Data Fund. The fund will invest in start-ups up and down the Big Data stack, from infrastructure to applications, according to Accel’s Ping Li. Expect to see the payoff from all that investment start to emerge in 2012.
- Hadoop and NoSQL VC Funding is More Than $350 Million – Up %266
- Investors Bank on Datastax, Neo
- Yahoo, Benchmark Invest $20 Million in Hortonworks on $200 Million Evaluation
- Five Reasons Why Accel Partners Announced the $100 Million Big Data Fund
- Digital Reasoning Scores Series B Funding, Looks to Expand Reach
(Cloudera’s Mike Olson and Accel Partners’ Ping Li Discuss the Impetus Behind Big Data Fund Live Inside theCUBE from Hadoop World 2011)
3. Big Vendors taking notice of Big Data. Start-ups weren’t the only vendors paying attention to Big Data in 2011. The mega software vendors each came to the realization that Big Data is for real and made corresponding moves to keep pace with the more nimble, innovative Big Data start-ups. SAP began touting HANA, its in-memory database engine, as the answer to its customers’ Big Data challenges at SAPPHIRE in June, while Oracle made news (and caused some groans) with the announcement of its Big Data Appliance at OpenWorld in October. Microsoft, meanwhile, ditched its Hadoop alternative LINQ to HPC in November, and threw its lot in with Hadoop via a partnership with Hortonworks on Project Isotope. Then there’s IBM, whose Big Insights platform uses Hadoop as its foundation, though Big Blue’s Big Data go-to-market strategy is still a bit confusing to most.
The MPP data warehouse market also continued consolidating in 2011. Following on IBM’s acquisition of Netezza and EMC’s acquisition of Greenplum in 2010, HP scooped up Vertica in March of this years followed by Teradata’s acquisition of Aster Data in April. There are still a number of attractive Big Data targets on the market, so plan on seeing more acquisitions from the mega-vendors in 2012 as they look to fill out their Big Data stacks.
- A Primer on SAP HANA
- Oracle Tries to Hijack the NoSQL Movement with Big Data Appliance
- Hadoop Gets Another Vote of Confidence, This Time From Microsoft
- Microsoft Embraces Node.js, DevOps and Hadoop with New Azure Release
- Time for IBM to Take Hadoop Message to the Streets
- SiliconANGLE CEO Analyzes HP-Vertica Deal Inside theCUBE
- Teradata Aster’s Agyros on Post-Acquisition Benefits
- Teradata Completes Aster Acquisition
- EMC Packages its Big Data Analytics Offering in “Unified” Proprietary Platform
(NetApp, Big Data and Moneyball explored)
4. Data Scientists rock out. If you told someone you were a Data Scientist in January 2011, you’d likely get a perplexed look in return. By fall of this year, tell the same person you’re a Data Scientists and you’ll probably be asked for an autograph. While enterprises began to grasp the potential of Big Data Analytics in 2011, they also realized they need a new type of worker with a blend of computer science, social science, and math skills to realize that potential. The problem? Talented Data Scientists are hard to come by. Those that do fit the bill – like Bit.ly’s Hilary Mason and Cloudera’s Jeff Hammerbacher – are in high demand. As more enterprises embrace Big Data in 2012, that demand is only going to increase. The rein of Data Scientists as the rock starts of IT is far from over.
- Wikibon Infographic: The Role of the Data Scientist
- Data Scientists Are Rocking the Big Data World
- Cloudera’s Jeff Hammerbacher on What it Takes to be a Data Scientist
- Data Scientist Frank Coleman Discusses Big Data Inside EMC
- Bit.ly Chief Data Scientist Hilary Mason Live Inside theCUBE at Strata
- Three Things You Need to Know About Data Scientists
- EMC Launches New Data Scientist Training Course
(Former Facebook Data Scientist Jeff Hammerbacher on What it Takes to Be A Data Scientist)
5. Big Data Applications Begin to Emerge. This one is more of a prediction for 2012, but 2011 did witness the beginnings of what I believe will be a vibrant and very competitive Big Data Applications marketplace. Now that Hadoop has proven that its enterprise-ready, developers are getting to work building next-generation Big Data applications to sit on top of Hadoop to deliver real business value to traditional (read: non-Web 2.0) enterprises. Among them are Tidemark, a cloud-based enterprise performance management vendor whose application suite leverages Hadoop for data storage and processing; and Tresata, whose Hadoop-based applications specialize in Big Data analytics for banks and other financial institutions.
More established application vendors are getting into the action as well. Tableau Software, for example, has made its data visualization and analytics platform capable of ingesting data directly from Hive, while SAP has begun delivering both vertical and horizontal analytic applications that sit on top of its HANA in-memory database. Oracle and IBM are also working on Big Data applications, and with the $100 million Big Data Fund now available, expect to see a slew of Big Data Application start-ups hit the market in the coming months.
- The Hadoop Ecosystem Ready to Explode
- Tresata Goes Deep on Big Data for Banking
- Tresata’s Abhi Mehta on the Next Wave of Big Data Application Innovation
- Tableau Brings Data Visualization to Hadoop
- SAP Has A lot Riding on its HANA Analytics Applications
- Five Big Data Tools Built on Hadoop
- Cloudera Co-Founder Launches Investigative Analytics Application
(Tresata’s Mehta on the Big Data Revolution)
Well, there you have the year that was in Big Data. It was a busy year, of course, so this isn’t an exhaustive list. Let us know what you thought were the biggest stories in Big Data in 2011. You can leave comments below, post on our Facebook page, take the conversation to Twitter (@wikibon) or even write your own post on top Big Data stories of 2011 on our wiki. Happy Holidays.
Bonus Coverage: Check out Wikibon’s Big Data Manifesto.