Clearing Up Some Confusion on the Hadoop Wars

Our friend Matt Asay, who oversees business development for streaming Big Data analytics player Nodeable (read hear about Nodeable’s recent shift in business model), penned a column today sizing up the Hadoop distribution competition. Asay narrows the competitors to two – Hortonworks and Cloudera – and proceeds under the premise that only one of the two can and will survive.

I, Wikibon Chief Analyst Dave Vellante and SiliconANGLE Founder and Editor John Furrier shared our take on the Hadoop competition in a very similarly titled post over a year ago on the occasion of Hortonworks’ founding. While we go into some depth on the topic, if I were to boil our conclusion down to a sentence or two, it would be this: the Big Data market is red hot and growing quickly, and “there is likely room for both Cloudera and Hortonworks to build credible businesses.” However, one of the two is likely to emerge as the top player in the Hadoop distribution business, followed by the other at a distant second.

But a lot has happened since then, including significant venture capital flowing into the market – even more than we anticipated — and the ecosystem of Hadoop-focused start-ups has subsequently exploded and is now thriving. I still stand by our premise from over a year ago that one vendor will eventually emerge to dominate the Hadoop distribution market. But, as we wrote then and believe now more than ever, there is more than enough room for a second player to thrive as well.

Furrier wrote a lengthy and detailed response to Matt’s piece earlier today on SiliconANGLE, so I won’t try to rewrite it.  The bottom-line, however, is that while Cloudera has built a solid business during its 3 year existence, over the last 12 months+ Hortonworks has proved it is a real contender with its 100% open source business model, its team that includes some of the top Hadoop committers on the planet, and its laser-like focus on making Hadoop a stable, powerful and open enterprise-grade Big Data management and analytics platform. Definitely take a look at Furrier’s piece, which goes into some detail about why the Linux/Winner Tale All analogy doesn’t hold in the Hadoop market.

I do want to quickly weigh-in on a couple of points Asay made, however. Two criteria he uses to compare Hortonworks and Cloudera are the number of committers each employs and the number of partners each works with. As for committers, it all depends on perspective. You can look at this from the “Who has committed more patches to Apache Hadoop?” perspective, in which case Cloudera probably comes out slightly ahead. But from the perspective of “Who has committed the most lines of code that make up the foundation of Hadoop as a platform?”, Hortonworks’ engineers take the prize.

Either way, though, both Cloudera and Hortonworks employ some of the smartest minds in the Hadoop ecosystem.

Having said that, we have noticed more activity over the last year from Hortonworks’ engineers on building out Apache Hadoop’s more innovative features. These include YARN, Ambari and HCatalog. Below, Hortonworks Co-Founder and Architect Arun Murthy, for example, discusses his efforts leading the development of YARN to move Hadoop beyond MapReduce to include more data processing frameworks (and more here on why this is important.)

 

The reason for this, I believe, is due to the two company’s competing business models. Hortonworks’ HDP is 100% open source, Apache-focused and free to download, so it makes sense that Hortonworks’ engineers would spend all of their time improving open source Apache Hadoop.

Cloudera takes a different approach. The core of its Hadoop distribution is based on Apache Hadoop, but its management and admin software, Cloudera Manager, is proprietary. Since Cloudera is betting on Cloudera Manager as its competitive differentiator, it logically follows that Cloudera engineers are going to spend the bulk of their time working on Cloudera’s proprietary software rather than committing to the open source community.

That’s not an attack on Cloudera, which I think is an excellent company that did more than anybody to jumpstart the Hadoop market back in 2009-2010 when most people didn’t even know what Hadoop was. But it is a fact.

On the second criteria, which company has the most partners, I think Asay misses the point. He gives Cloudera the advantage, as it has over 300 partners compared to Hortonworks 60+. But raw number of partners is not what’s important. What is important is who has the most strategically valuable partnerships and who has the deepest partnerships with the most important ecosystem members.

Any two companies can put out a press release saying they’re partners, but it takes a commitment to work closely together on the engineering and services level to make partnerships meaningful. And that’s even truer in the open source world.

I don’t think we can say at this point who has the better partnership program, but at Hortonworks’ Hadoop Summit in June, we saw first-hand the community that has developed around open source Apache Hadoop. To say it’s a thriving community is really an understatement. Below are interviews with just a handful of Hadoop ecosystem players. Take a look at all the interviews from Hortonworks Hadoop Summit here to see for yourself what I’m talking about.

Tresata’s Abhi Mehta at Hortonworks Hadoop Summit 2012

Teradata Aster’s Tasso Agyros At Hortonworks Hadoop Summit 2012

Syncsort’s Mitch Seigle at Hortonworks Hadoop Summit 2012

The reality is we are still very early stages of the Big Data Era. It’s important to keep perspective and remember that strong commitments to both building out Hadoop as a world-class Big Data platform and developing deep and meaningful partnerships among a thriving ecosystem of players are the keys to Hadoop’s future.

Share

, , ,

  • John

    If you stopped the clock today I would declare cloudera the winner because they have the lead being first mover. however hortonworks is pulling up fast and they are not a gimmick at all they are the real deal.

  • Amby

    I wonder why MapR is left out from mentions of Hadoop wars. is it because it nowhere stands close to these two ? what about Enterprise scalability ? does that factor in when comparing hadoop distributors ?

  • Pingback: Experts Debate Strategy in Hadoop Turf Wars | SiliconANGLE

  • Dan McCreary

    If MapR was not the leader why would other organizations like EMC be reselling it?