It was interesting to see VMware put its toe in the water at the O'Reily Strata conference 2012 with the announcement the first release (1.0.0.M1) of Spring Hadoop. Spring Hadoop provides support for developing applications based on Hadoop technologies by leveraging the capabilities of the Spring ecosystem. This is not (another) distribution of Hadoop, at least not yet, just the ability the coordinate the Hadoop components (MapReduce, HDFS, Pig, Hive jobs or anything in between) within the SpringSource ecosystem.
Big data is by definition IO intensive, and VMware still has a way to go to reduce the its IO tax of 25% or more. GemFire, VMware's big-data-in-memory framework, is usually run on bare metal by serious users.
If VMware is going to be relevant to real big data operations, VMware has to bring forward a new data management system that radically reduces IO overheads and take advantage of flash technologies in much more advanced ways than just FlashCache. Wikibon hopes that EMC is not going to be a drag to VMware selecting the best-of-breed partners to solve these problems.
Action Item: Users thinking of running big data on VMware should have a clear and detailed understanding of how VMware IO overheads will be mitigated in detail, and in what timescale.
Footnotes: