Ease-of-Use, Messaging Will Separate Commercial Hadoop Winners from Losers

Apparently the rumors are true. According to a report by GigOM, Yahoo will announce this week it is spinning-off its Hadoop engineering unit into a separate entity.

The new company, to be called HortonWorks (inspired by the Dr. Seuss character), will focus on developing its own enterprise-ready Hadoop distribution and support services based on the open source Apache Hadoop project. When HortonWorks debuts its commercial Hadoop distribution, it will be the third such product on the market, along with commercial distros from Cloudera and EMC Greenplum (See Table 1).

Cloudera’s Distribution including Apache Hadoop 


  • CDH3 is the most mature commercial Hadoop distribution on the market.
  • Boasts top Big Data minds, including Hadoop creator Doug Cutting and former Facebooker Jeff Hammerbacher.
  • Raised $36 million in VC but yet to turn profit.
EMC Greenplum HD
  • EMC has deep pockets to invest heavily in product development.
  • Greenplum brings significant expertise in massively parallel processing, data warehousing.
  • Lacks open source credibility.
  • Heavily involved in Apache Hadoop project, contributing over two-thirds of the project’s code-base.
  • Yahoo has the most experience of the three working with Hadoop internally.
  • Lost Hadoop talent to Cloudera, others.

(Table 1)

Yahoo’s motivation for HortonWorks is pretty straightforward. The company wants to get a return on its investment in Hadoop and prevent start-up Cloudera from capturing what could be a billion dollar market. Last year, Cloudera Founder Amr Awadallah said his company was “the only game in town” in the commercial Hadoop market (see clip.) Clearly, that’s no longer the case.

Watch live video from SiliconANGLE.com on Justin.tv

The entrant of yet another commercial Hadoop vendor is a shot in the arm to the open source, distributed computing framework for processing and analyzing petabytes and even exabytes of data. But it also adds to the market confusion.

In addition to the three commercial Hadoop distributions now on the market, there is also a developing ecosystem of smaller vendors that specialize in one or another of Hadoop’s subcomponents. These include DataStax, which offers a commercial version of the Cassandra NoSQL database; and Karmasphere, which developed an analytics engine to sit on top of the Hadoop/Map Reduce infrastructure.

Others, including IBM and HP Vertica, have signaled they will not get into the commercial Hadoop market, but are none-the-less supporting integration between Hadoop and its data analytics platforms. Meanwhie, data integrations vendors like Syncsort are developing tools to simplify moving data around inside of Hadoop.

Yahoo is a little late to the game, but if HortonWorks develops a compelling product that appeals to companies outside the Hadoop mainstream – i.e. Web 2.0 companies, telcos and large financial firms – it has a chance to make a significant impact. I believe there are numerous potential applications for Hadoop in a number of other industries, including retail, pharmaceuticals, and energy.

It is far from clear which of the three commercial Hadoop distributions will come to dominate the market or if there is enough room for all three to prosper simultaneously. The role of niche Hadoop vendors is also still playing out. Who comes out on top depends on a combination of technology developments and marketing.

Deploying and managing a Hadoop installation is a complex affair requiring significant programming expertise. The vendors that thrive will be the ones that reduce this complexity either by making their Hadoop distributions easier to manager and/or provide cost-effective services thereby removing the need for expensive/hard-to-find internal Hadoop programmers; and the those that coherently and effectively explain how Hadoop can help companies in all industries improve efficiencies and identify new revenue opportunities.


, , , ,