EMC announced today it has integrated the Isilon scale-out network attached storage (NAS) platform with an Apache-based Greenplum Hadoop distribution to make the Big Data framework more palatable to enterprises with strict SLA requirements.
The move is designed to address a number of Hadoop’s enterprise-level shortcomings by applying Isilon’s backup and recovery capabilities and more efficient storage to the open source Big Data framework, according to EMC.
EMC will make Isilon an optional module of its Greenplum Data Computing Appliance, which also includes an Apache-based Hadoop distribution called Greenplum HD, the standard or high-capacity Greenplum database, and the Greenplum Data Integration Accelerator.
Customers can also deploy Isilon as an appliance alongside a Greenplum HD cluster via a 10 GigE connection without the Greenplum database.
EMC acquired Greenplum in July 2010, followed by Isilon in November 2010. In May 2011, EMC released its own Hadoop distribution, Greenplum HD, which leveraged MapR’s proprietary file system, NFS. That product will now be called Greenplum MR and will not support integration with Isilon. What is now known as Greenplum HD is a “homegrown,” fully Apache-compatible Hadoop distribution, according to EMC.
EMC says it is not backing away from its partnership with MapR, but it appears as though the vendor has determined it needs a viable open source, Apache-compatible Hadoop distribution to gain significant adoption among “traditional” enterprises.
“We see [Greenplum] MR as a high-performance Hadoop offering for customers that have advanced needs around Hadoop that have tried other distributions and are looking for more,” said Will Davis, Manager of Greenplum Product Marketing at EMC.
Making Hadoop “Safe” for the Enterprise
The impetus for the Isilon/Hadoop integration was feedback from enterprise customers who said they were keen on Hadoop’s ability to process and facilitate analysis of large volumes of unstructured data but were wary of the open source framework’s shortcomings, particularly the single-point-of-failure issue, said Brian Cox, Senior Director of Isilon Product Marketing at EMC. They also did not want to support a separate storage infrastructure dedicated to Hadoop.
With Isilon integrated with Hadoop, enterprises can theoretically store all their data, structured and unstructured, in one environment to support multiple applications and workloads, including Hadoop. In essence, the new arrangement allows for the separation of storage from compute in a Hadoop cluster, providing administrators that ability to scale-up or scale-down either of the two independently from one another.
EMC wraps the entire package with its formidable services offerings, delivering what it says is the only one-stop shop for Big Data hardware, software, support and services. The company also said Isilon integration will be available in the upcoming Unified Analytic Platform, which is set to go GA sometime this quarter. UAP includes Chorus, a collaborative analytic tool for Data Scientists.
Megavendors Do Big Data Battle
For those keeping track, the Isilon-Greenplum HD integration move is just the latest in the Hadoop wars as start-ups and mega-vendors alike look to differentiate their respective versions of the Big Data framework.
Earlier this month, for example, Oracle released its Big Data Appliance, which bundles Cloudera’s Hadoop distribution and technical support services with the Oracle NoSQL database and related data integration capabilities. IBM has a slew of Big Data-related offerings, including support for Hadoop, but has yet to connect all the parts into a cohesive whole. And Microsoft plans to release a version of Hortonwork’s Hadoop distribution running on Windows Azure in March.
HP and SAP, meanwhile, have largely eschewed Hadoop for different approaches. SAP is focusing its Big Data efforts on HANA, an in-memory computing appliance designed to support real-time analytic and transactional processing. HP recently released its own Big Data platform pairing the Vertica columnar database for structured data with Autonomy’s IDOL software for unstructured data analysis.