EMC’s entrance into the commercial Hadoop market is a low-risk, potentially high-reward move by the Hopkinton, Mass.-based storage vendor, as the market for big data processing and analytics technologies could reach into the billions of dollars in the coming years. However, the company faces significant hurdles as a new entrant into this space. We do not expect the new Greenplum HD appliance to gain significant adoption in the short-term, due in part to EMC’s lack of credibility in the open source Hadoop community and the company’s dearth of software-specific sales channels and experience in this emerging space.
Nonetheless, EMC’s move marks a significant enterprise leader eyeing Hadoop opportunities with a strategy to simplify Hadoop deployments and deliver a higher value than current Hadoop distributions. EMC’s long-term viability in this market depends on a number of interrelated factors. Specifically, EMC’s Hadoop gambit will succeed or fail based on how well the company works with the Apache Hadoop open source community; its ability to identify true market requirements and integrate its existing storage, database and other technology assets with the Hadoop framework; the depth of its ecosystem, and the development by EMC of new sales channels.
Greenplum HD marries MPP Data Warehouse and Hadoop Framework
The new EMC appliance, called Greenplum HD Data Computing Appliance and announced today at EMC World 2011, combines the Hadoop framework with Greenplum’s columnar-oriented, massively parallel processing data warehouse software and EMC’s commodity hardware.
The appliance is expected to be available in the third quarter of 2011 in two forms: a community edition based on Facebook’s Hadoop distribution and an enterprise edition that supports multiple languages in addition to Java and probably will include EMC professional services and those of its partners.
EMC says the Greenplum HD appliance is deployable on a number of storage options, including HDFS, Cassandra, MapR and its own Isilon OneFS. It eliminates single-point of failure for NameNode, Job Tracker and other key components underlying Hadoop, according to EMC. And it enables real-time data processing when integrated with CasandraFS and MapR FS.
EMC also says the all-in-one appliance architecture will allow enterprises to deploy Hadoop installations significantly quicker than stitching together Hadoop’s individual open source components independently, or – by inference if not direct quote – faster than competing commercial Hadoop distributions from Cloudera and IBM.
Why is EMC doing this now?
In our view, EMC is making its move in the commercial Hadoop market at this time for four primary reasons:
- To reestablish its strategic position on Wall Street as a growth company by targeting hot markets and introducing new revenue streams in the form of commercial Hadoop distributions and services;
- To leverage its legacy storage prowess, channels and recently acquired Big Data acquisitions (e.g. Isilon and Greenplum) to ride the growing and potentially lucrative Hadoop wave;
- To maximize its investments in Greenplum and Isilon by packaging the software and storage into a convenient appliance bundle (not unlike Oracle has done with Exadata);
- To prevent up-start Cloudera from establishing itself as the de facto Hadoop distribution.
By way of background, enterprises are eager to make use of the petabytes of customer and other data being generated within different parts of the extended enterprise and on the Web each day. Thanks to high-profile users like Facebook and Yahoo!, open source Hadoop has quickly become the leading method for processing and analyzing petabyte-level, distributed data. Its use has spread from purely web-based companies to financial firms, pharmaceuticals, government, medical research and many other segments. In short, Hadoop is hot and only getting hotter.
Cloudera, the Hadoop framework leader, is to Hadoop what RedHat is to Linux. RedHat has a nearly $10B market cap, and EMC is looking for new sources of valuation growth. In recent years EMC’s valuation growth has been almost exclusively driven by it’s roughly 80% ownership of VMware. In 2010, the value of EMC’s non-VMware holdings actually contracted (while the S&P 500 rose 12.78% during the same period.) It is important for EMC to illustrate to Wall Street that the storage vendor has a new strategy for growth outside of its core business.
EMC recognizes this and appears to have settled on a strategy -- evident by its recent acquisitions, marketing campaigns and today’s announcement -- to position itself squarely at the intersection of two important and growing technology movements: cloud computing and big data.
As part of that strategy, EMC is following a path well worn by Oracle and others. That is, bundling its analytic software with server and storage technology in the form of preconfigured analytic appliances. Data warehouse appliances are significantly faster to deploy and easier to manage than roll-your-own data warehouses, and have the added benefit of a single SKU. The appliance model has gained significant adoption in the data warehouse market in recent years, to the point that Wikibon believes the appliance model will soon become the most popular data warehouse deployment approach. EMC’s Greenplum appliances will benefit from this trend; however it is unclear at this point if Hadoop users will adopt this method of deployment.
Meanwhile, Cloudera, a Silicon Valley start-up that includes Hadoop creator Doug Cutting and former Yahoo engineering VP Amr Awadallah, has had the commercial Hadoop marketplace virtually to itself for the last two years. It has used that time wisely, contributing heavily to the open source Apache Hadoop project, developing its own well-regarded commercial Hadoop distribution, cultivating a stable of more than 100 paying clients, and generally building a reputation as the go-to commercial Hadoop vendor. The buzz surrounding Cloudera is nearly deafening, and EMC hopes to prevent the upstart with significant “cool-factor” from running away with the market.
Indeed, while EMC formed a very loose partnership with Cloudera last summer, the storage vendor has made the strategic decision to go it alone rather than partner closely with Cloudera, Yahoo, or another Hadoop framework developer. With the Greenplum HD release, EMC has, in essence, shown its Hadoop hand and gone all-in.
The competitive positioning with Cloudera is the main reason EMC is pre-announcing this initiative. By doing so, EMC hopes to freeze the market for Cloudera services and set the marketing groundwork to de-position Cloudera as less robust.
On balance, we believe EMC is in the midst of a transformation. The company is eager to exploit interest in Hadoop and big data to kick-start a new period of growth. Wikibon believes this strategy is a sound one, but it will take time nd lots of trial and error.
Will EMC’s Gambit Pay Off?
Wikibon believes EMC’s success or failure in the commercial Hadoop market depends on four critical factors.
1) EMC’s reception by the open source Apache Hadoop community
Credibility is critical to gaining adoption in a young, open source technology community like Hadoop. Traditionally, such credibility is contingent on making significant contributions to the open source project in question. In this case, EMC’s contributions to the Apache Hadoop project are negligible. The company has few if any engineers regularly contributing to the project in comparison to Cloudera, which has dozens of engineers contributing to Apache Hadoop. Indeed, Cloudera’s entire raison d'être revolves around strengthening the Apache Hadoop project to in turn bolster its own commercial Hadoop distribution. Instead of slowly trying to earn credibility by contributing to open source Hadoop organically, Wikibon believes EMC will attempt to hire its way into the community, acquiring respected engineers from Yahoo and others with significant Hadoop experience. This strategy may work in the long-term, but it will not happen overnight for EMC.
2) EMC’s ability to position its Greenplum HD appliance as the most enterprise-ready commercial Hadoop distribution
In order to dislodge Cloudera from the top of the Hadoop food chain, EMC must position its Greenplum HD appliance as the most stable, highest performing enterprise-class Hadoop product on the market. This also means sowing doubt among the Hadoop community as to the robustness of Cloudera’s Hadoop distribution. And EMC has an opening. There is significant whitespace in Cloudera’s Hadoop distribution, a fact that Cloudera itself is well aware of and is actively trying to fill. EMC’s challenge is to exploit Cloudera’s shortcomings while building its own credibility. EMC is a marketing machine, and will undoubtedly use its vast resources to fight an image war with Cloudera. Cloudera, meanwhile, has to date paltry marketing capabilities of its own and is vulnerable to a full-scale marketing attack by EMC.
3) EMC’s ability to successfully integrate its Greenplum data warehouse appliance and commodity hardware with the Hadoop framework
EMC can’t just talk a good game, however. It must also deliver a well-integrated appliance that seamlessly combines its analytic database, commodity servers and proprietary storage technology with the open-source Hadoop framework. EMC is in a good position to do so, as Greenplum’s MPP architecture, which runs analytics jobs in parallel, would seem a natural fit with Hadoop’s distributed nature. Its Isilon storage line is also purpose built for large, unstructured storage. Cloudera definitely has the edge, however, when it comes to the Hadoop distribution itself. EMC must innovate and bring value to its own distribution to edge out Cloudera.
4) EMC’s ability to develop new sales channels and distribution methods
EMC is at heart an infrastructure vendor. Its forte is selling hardware to storage administrators. Selling analytics software, even wrapped in an appliance with preconfigured hardware, is a very different business, a fact EMC is no doubt well aware of. At present, most Hadoop “buyers” are data scientists and line-of-business end-users looking to exploit Hadoop to solve a particular business problem in an end-run around IT. EMC’s recent marketing efforts have begun targeting this new constituency (i.e. EMC’s Data Scientist Summit running in conjunction with EMC World this week), but it will take time for EMC’s sales team to master the subtleties of selling to this market. Cloudera, meanwhile, has ingratiated itself with data scientists and the larger Hadoop ecosystem. However, the quest for Hadoop market share is not a zero-sum game, and Wikibon believes both strategies could work. Missing from the discussion is the possibility of delivering Hadoop and other big data technologies as a service. Delivering Hadoop in the cloud makes intuitive sense, and we believe it is a strategy EMC and other commercial Hadoop providers should explore with service providers.
User Advice
Despite the significant progress made by the Apache community and start-up contributors like Cloudera, Hadoop is still in its infancy. Like most young open-source technologies, Hadoop is and will continue to be for some time a moving target. Development of Hadoop is highly iterative and experimental in nature, so end-users should carefully consider the following four recommendations before embarking on a Hadoop deployment:
First, success with Hadoop in the enterprise depends highly on end-users aligning themselves closely with the open source community in order to take advantage of the Apache Hadoop project’s latest contributions and developments. End-users should get engaged with the project, experimenting with community member contributions and contributing back to the project when possible.
Second, as for Hadoop distributions, Wikibon believes enterprises that wish to experiment with Hadoop in the near-term should use Cloudera’s Hadoop distribution, which is quickly becoming the de facto standard.
Third, let EMC earn its spurs. As stated in this note, EMC has a lot of work to do before we would consider the Greenplum HD appliance enterprise-ready. Further, with its all-in-one appliance model, users that adopt the Greenplum HD appliance now risk vendor lock-in. While often the benefits of lock-in outweigh the risks, with an unproven platform in a very green market users must exercise caution here to limit exposures.
Fourth, consider EMC’s Greenplum HD appliance and Hadoop distribution when its solutions framework as a whole has matured to production ready. At that point, EMC’s integrated appliance approach may indeed bring significant value to enterprise end-users. In the meantime, it is worth noting there is no reason end-users can’t run Hadoop distributions in conjunction with Greenplum’s MPP data warehouse on their own, without investing in the new Greenplum HD appliance. This type of activity appears limited in the market today raising questions about the requirement for a bundled appliance approach in the Hadoop market.
The bottom line is a Hadoop gold rush is going on, and EMC is staking its claim. It doesn’t want to let Cloudera capture the lion’s share of the value chain and directly leveraging its Greenplum acquisition is the logical path to market.
Action Item: Leveraging data is increasingly becoming the source of competitive value for organizations, and Hadoop is at the center of at industry trend. EMC’s aggressive entry into the commercial Hadoop market is good news for end-users as the more vendors working on commercial Hadoop distributions, the more technological innovation will occur. However this has the effect of increasing market clutter. Enterprise users should rapidly gain experience with Hadoop and identify where and how the technology can be applied and data value can be monetized.
Footnotes: Wikibon's David Vellante, David Floyer and SiliconAngle's John Furrier contributed to this professional alert.