Hadoop, Big Data Focus Shifting To Analytics and Visualization

With Hadoop World* less than two weeks away, expect to see an increasing number of vendor announcements regarding the analytics and visualization layers of the Big Data Stack.

That’s because, as the infrastructure layer continues to mature, vendors and increasingly enterprises are turning their attention to the real value proposition of Big Data – namely, deriving actionable insight via Big Data Analytics and Visualization.

That’s not to say Big Data infrastructure isn’t important or doesn’t need improving – clearly tasks like writing and managing complex Map Reduce jobs and networking racks of Hadoop nodes still need simplifying – but that it has reached a maturity level where it is now practical for may enterprises to shift at least some of their focus to analyzing and making use of the data in addition to processing and storing it.

Back in September at Strata, Cloudera CEO Mike Olson (see Olson inside theCUBE at Hadoop World 2010 below) told me improving the analytics layer of Hadoop is a top priority for his company over the next year. Indeed, Cloudera has inked a number of partnerships with analytics vendors recently with the aim of optimizing the analytics layer sitting on top of Cloudera’s Hadoop distribution, CDH3. They include:

  • Karmasphere: The company is working closely with Cloudera to seamlessly integrate its analytic development platform with CDH3. The Karmasphere Analyst platform allows data scientists to explore Hadoop-based data via a SQL interface.
  • Attivio: In July, Attivio introduced new XT Modules for its Active Intelligence Engine that allow developers to incorporate Hadoop-based Big Data with other unstructured data locked in internal emails and documents. Developers use the Attivio platform to build user-facing applications.
  • Microstrategy: The Virginia-based business intelligence vendor has developed a Very Large Database Driver specifically for CDH3 that allows users to query data stored in Hadoop without needing to write HiveQL or MapReduce programs and analyze/visualize the data via Microstrategy’s dashboards.

Cloudera isn’t the only Big Data vendor focusing on analytics. Rival Hortonworks, for example, recently inked a partnership with Tresata (which itself is working with Tableau Software on the visualization front), whose cloud-based platform is tailored to Big Data Analytics for banks and other financial institutions. And EMC has tightly integrated its Hadoop distribution (which uses MapR’s proprietary file store) with the Greenplum Analytic Database.

In the non-Hadoop world, HP is expected to bring together two recent acquisitions – Autonomy and Vertica – into a single Big Data Analytics platform for deriving insight from both structured and unstructured data. Oracle, SAP, and IBM are also dedicating significant resources to Big Data Analytics platforms and tools.

To reiterate, there’s still plenty of work to do on the infrastructure layer of Hadoop and other Big Data approaches (Cloudera is also partnering with vendors like Informatica, SGI, Dell and VMware to do just that.) But the focus of the Big Data industry is – and should be – moving to include analytics and visualization.

This is especially important for enterprises. Hadoop and other Big Data approaches, while still somewhat novel, should not be treated as some off-to-the-side science project. Enterprises should apply Big Data approaches like Hadoop only when they’ve identified areas where Big Data will help soothe a significant pain-point and/or bring real business value. And this requires analytic/visualization platforms and applications, tools that provide insights from Big Data that facilitate innovation such as identifying new market opportunities or helping create new products.


Don’t deploy and start tinkering with a Hadoop cluster just for the sake of it. An initial lack of focus will likely lead to more Big Data failures than successes, sabotaging future efforts where Big Data could have significant impact. Yes, enterprise should experiment with Big Data, but should do so only after identifying promising potential use cases that will result in improved competitive advantage and investing in the analytics tools needed to exploit them. As the vendor community, including services providers like Think Big Analytics and Lunexa, shifts its focus to the Big Data analytics layer as well, enterprises will find more and more tools and services available to help them do the job.

(*Wikibon and SiliconANGLE are bringing theCUBE to Hadoop World again this year. The event takes place in New York City on Nov. 8 and 9. Watch live streaming coverage at SiliconANGLE.tv. We’ll have discussions with Cloudera, partners like Informatica and Teradata, as well as data scientists, Hadoop engineers and end-users.)