If, as Wikibon co-founder and CTO David Floyer says, trying to capture and process all the unstructured data flooding enterprises and large government agencies today sometimes makes you feel like “a snake trying to swallow a basketball”, the keys to accomplishing that feat are virtualization and collaboration. This is the argument that Richard Snee, marketing VP of EMC's new Enterprise Computing Division, formerly startup Greenplum, made when interviewed by Wikibon CEO and co-founder David Vellante and SiliconAngle founder John Furrier in an interview on Siliconangle.tv from Oracle OpenWorld 2010.
The partnership between the EMC division and CloudEra announced at Oracle OpenWorld is an illustration, Mr. Snee said. CloudEra's Hadoop-based software for capturing, formatting, and managing the tsunami of unstructured data washing over large public and private organizations today is vital to achieving the Greenplum vision of combining data and processing on a massively parallel engine to create the analytic database.
“The Hadoop movement is in some ways the wave of the future,” Snee said. “In our customers we are seeing more and more use of Hadoop and CloudEra's distribution of Hadoop. So it was a natural transition for us....The two us working together will provide some powerful results for our customers.”
The result, he says, is that customers can use CloudEra's software to prepare and stage massive amounts of unstructured data in near real-time for importation into the RDBMS running on Greenplum's massively parallel hardware, bringing the data as close as possible to the processing to support very high speed deep analysis.
Hadoop, said Mr. Furrier, grew out of the Yahoo environment, which is characterized by large numbers of small, light apps that generate chunks of data of different kinds – structural, behavioral, etc. By themselves they may not provide much information, but in aggregate they can identify emerging trends, marketing opportunities, highly valuable information, if they can be analyzed quickly to extract that information. Today that kind of environment is becoming generalized in the cloud as users adapt highly mobile platforms such as the iPad, where everything takes the form of small, light apps that users consume as virtual popcorn.
CloudEra and EMC believe they are solving the basic problem of managing unstructured as well as structured data, not just to preserve and search the raw data to meet compliance requirements but to extract useful information to identify business trends and opportunities in real time and support better decision-making. To do that, Mr. Snee says, they have to step back and develop a holistic vision that includes the opportunities as well as the challenges, “and then making thoughtful, incremental steps in providing solutions to your customers.”
“I think you will see a diverse set of tools and form factors, whether it is software that runs on commodity hardware or an appliance that fits within that environment, within that enterprise cloud environment,” he said, implying that the CloudEra alliance may be the first of several.
“We have been driven by this new community of hard-care data scientists – data rock stars – that are intense about doing deep and powerful analytics, where it's no longer a singular task,” he said. “We have a responsibility to provide this framework, whether it is for applications or for people, where there is this connectedness and the ability to leverage one another's work.”