Cloudera Sizes Up Hadoop Competitors EMC Greenplum and Yahoo Hortonworks

Yesterday my Wikibon colleagues and I had the pleasure of speaking with Charles Zedlewski, Vice President of Products at Cloudera, in anticipation of today’s Cloudera Enterprise 3.5 release. In addition to discussing the new features in today’s release, Zedlewski also talked about Cloudera’s position in the now three-member commercial Hadoop distribution market.

EMC joined the commercial Hadoop club in May with the release of its own enterprise distribution, which includes MapR’s distributed file system. This morning, Yahoo spun-off its Hadoop engineering unit to form HortonWorks, a new company that will offer its own enterprise Hadoop product soon.

Though Zedlewski was diplomatic with his opinions, I think his comments are good indication of how Cloudera plans to compete in the increasingly crowded Hadoop market.

Cloudera on EMC Greenplum with MapR

Zedlewski said he found it odd that EMC claimed Hadoop was not ready for enterprise production environments when the company announced its entry into the Hadoop market.

“It’s a little sad to market a product and then say the product’s not good,” Zedlewski said when asked directly about EMC’s marketing message. “It’s a curious approach. I think all evidence points to the contrary.”

He pointed out that the open source Apache Hadoop stack is in use and running today on 100,000 or more servers. Cloudera customers in particular, a number of them Fortune 500 companies, are running mission-critical workloads with tight SLAs on Hadoop clusters made up of thousands of servers, multiple petabytes of data and hundreds of concurrent users.

“If Hadoop isn’t ready for production, our Fortune 500 customers sure aren’t telling us that,” Zedlewski said.

Of course, EMC was attempting to use its experience delivering production-ready solutions to large enterprise customers to distinguish itself from Cloudera when it announced its Hadoop ditribution. I think that distinction might have been more effective had EMC gotten into the commercial Hadoop market earlier because, as Zedlewski points out, Cloudera has proved itself in a number of production environments.

In fact, Zedlewski said establishing Cloudera as an open source company was key in this regard. When Cloudera was founded in 2009, “there was no way a Fortune 500 company would bet serious production workloads on a 20-person start-up and they had no other recourse,” Zedlewski said.

Cloudera has more than doubled in size since then, with a headcount now of over 100 employees. But that’s still pretty paltry compared to EMC, a company of 45,000, so I actually think Zedlewski’s assessment of the impact of Cloudera’s size and age applies almost as much today as it did in 2009. The more compelling argument in favor of Cloudera over EMC is the former’s open architecture over the latter’s proprietary approach, an argument Zedlewski recognizes.

“Because we are actually fronting an open system that has a diverse community of vendors backing it, it means that for a large enterprise your investment is protected,” Zedlewski said. “Yes, we think Cloudera is going to be the best provider of what you need to run Apache Hadoop in production, but we’re not the only (open source) provider and that gives the enterprise customers a level of insurance and investment protection that they don’t have if they go with a proprietary system.”

EMC also lacks open source credibility. I couldn’t find any EMC engineers listed on the Apache Hadoop contributors page (please correct me if I’m wrong here). And I agree with Zedlewski when he told us, “The definition of contribution in the Apache open source world is code. If you’re not contributing code you’re not contributing.”

The bottom-line for Cloudera: “We believe that people want to buy into an open platform,” Zedlewski said. I agree, for the moment. Hadoop is still a young and emerging technology with many moving parts, and no company wants to get stuck with a proprietary Hadoop fork that goes nowhere, a Hadoop dead-end if you will. (That’s partly the reason HP Vertica decided not to release its own commercial Hadoop distribution, Vertica’s Colin Mahoney told us live inside theCube at HP Discover.) That could change when the technology matures more, but for now open is the way to go.

Cloudera on Yahoo spin-off HortonWorks

We spoke to Zedlewski before the Yahoo unveiled HortonWorks this morning and he wisely chose not to comment on the yet-to-be-made announcement. But he did speak about Cloudera’s relationship with Yahoo and its contribution to the Apache Hadoop project.

Cloudera and Yahoo engineers have worked closely together on Apache Hadoop contributions for years, Zedlewski said. Cloudera often reviews Yahoo’s patches and vice-versa. He said he expects this type of relationship to continue, HortonWorks not withstanding.

But Yahoo has focused its contributions on improvements that benefit itself and perhaps other large Web companies, Zedlewski said. Yahoo doesn’t have any experience developing Hadoop features for customers in a variety of industries, which is what HortonWorks must do to survive.

“In the case of Yahoo as a large Web property, they’ve been improving the system where they thought it was needed in terms of requirements that Yahoo had,” Zedlewski said. “From our perspective, we serve the enterprise. So we’ve been adding features and capabilities and quality improvements where we’re hearing feedback from our enterprise customers and the parts of Hadoop they need better.”

I think Zedlewski is on the right track here. The real test for HortonWorks is not to prove it can develop and deploy a robust Hadoop deployment for large Web 2.0 companies. We know it can do that. Yahoo has been using Hadoop to process huge volumes of click-stream data to match ads with users, detect spam in Yahoo Mail, and pick top stories for its homepages for years.

But can HortonWorks add similarly compelling value-adds to its distribution to attract non-Web 2.0 companies? If it wants to succeed in an increasingly crowded market (and I think we’ll see more entrants over time), it’s going to have to.


, , , , ,