Strata Conference: The Continuing Story of Hadoop

I’m back from Strata Conference and after three days, 16 keynote presentations, countless sessions, 20+ hours of live coverage via theCUBE, and two very long flights from Boston to Silicon Valley and back, these things I’m sure of:

  1. Big Data, namely Hadoop, is for real.
  2. It still has some maturing to do.

You know a technology is headed to the mainstream when the two “Elite” sponsors of the premier event designed to showcase that technology are Microsoft and EMC. Neither company is known for adopting and promoting emerging open source technologies, to put it mildly. But there they both were at Strata Conference, the event dedicated to open source Big Data approaches like Hadoop and NoSQL, topping the list of event sponsors. They were followed not far behind by fellow IT giants and Strata “Impact” sponsors IBM and Oracle.

Strata Conference Elite Sponsors

It would be easy to write-off these sponsors as opportunists looking to capitalize on the Big Data Bubble. That is, it would be easy if they weren’t each making serious technological and financial commitments to Big Data, which they are. Microsoft in particular is aggressively courting the Hadoop market, forging an alliance with Hortonworks to bring Hadoop to Windows Azure and Windows Server in short order. The company is also working on making Hadoop and Hive-based data accessible via familiar end-user tools like Excel PowerPivot, PowerView and SQL Server Analysis Services.

Then there were the attendees at the conference. Attendance nearly doubled this year to close to 2,500 people from 1,300 in 2011. But just as important as the number of attendees was the mix of personalities. Like last year, there was an abundance of smart developers and Hadoop committers in the mix, including exciting start-ups like Tokutek and Skytree. But there were also a lot more suits this time around. The business-side of the enterprise was well represented at the show and an entire track was dedicated to showcasing how Hadoop can be applied to various vertical markets to address pressing business problems.

When both IT heavyweights like Microsoft and CIOs at mainstream enterprises begin showing up at Big Data events, there’s a good bet there some substance behind the hype.

That said …

Hadoop still has some terrain to cover before it can be considered what I’m calling “traditional enterprise-ready.” Clearly Hadoop is ready for the Web companies of the world (Facebook, Zynga, Klout, Twitter, etc.) as well as Big-Data-as-a-Service providers (AWS, Tresata) whose value proposition is abstracting away Hadoop’s complexities for end-users. We’ve also seen a number of successful Hadoop deployments at large Fortune 500 companies like JPMorgan Chase. But mainstream enterprises, most of whom lack the in-house skills to manage Hadoop clusters and related data integration tools and end-user applications, are definitely looking from more improvements before going all-in on Hadoop.

This was evident from the numerous vendors at the show who claimed their products and/or services supplement Hadoop with the needed backup and recovery, security and easy-to-use administration capabilities required to spur widespread adoption. Among them were vendors that replace HDFS and its related shortcomings with either their own proprietary file store (MapR) or competing NoSQL database (DataStax); vendors that have been working feverishly on open source Next Generation MapReduce (Hortonworks and Cloudera) that promises to, among other things, add real-time streaming capabilities to Hadoop; and vendors peddling supplemental management (Pervasive Software) and security software (Zettaset) to make working with Hadoop easier and faster.

Then there’s Big Data Applications, those tools and technologies that enable end-users to analyze Big Data or otherwise put insights gleaned from Big Data to use. In short, there weren’t many polished Big Data apps on display at the show, which was a big disappointment for me (and not just because I predicted in January that 2012 would be the year of Big Data Applications … but hey, there’s still 10 months to go!) As Battery Ventures’ Mike Dauber put it on theCUBE, “A ton of money is going into the infrastructure layer, but if you move just one click up the [Big Data] stack [to the application layer] there isn’t a lot going on.” He added, “Really the only company out there that we were seeing last year that really had any level of traction was Datameer.”

Clearly this needs to change if Hadoop is going to go fulfill all its heady expectations. There’s little use in building out and maintaining a giant Hadoop cluster and filling it with tons of unstructured data if there are no applications available to sit on top of the stack to actually make use of all that Big Data. Companies like Datameer are making strides, as Dauber points out, but we’re still waiting for the great Big Data Application Wave to wash over us.

Finally, beyond overcoming Hadoop’s technological limitations, Strata reinforced to me that we as an industry have yet to come to grips with the cultural implications of Big Data. Put another way, Big Data allows us to perform new, more powerful types of data analysis, but that doesn’t mean we always should use them. Why? Because Big Data can freak people out. Case in point: there was a lot of talk at the conference about the recent New York Times Magazine piece outlining Target’s use of predictive analytics to target customers with ads. Problem was, Target did such a good job they figured out a teenage girl was pregnant – and sent her coupons for baby-related products – before her father knew. Needless to say, he wasn’t happy.

Now, I’m not saying Target is right or wrong to use Big Data the way it did, but there needs to be more discussion around when particular uses of Big Data technology are appropriate and when they are not, with the eventual goal of developing some level of industry standards relating to privacy considerations.  Of course, companies must be allowed to use Big Data in new and innovative ways, otherwise it wouldn’t be much of a competitive differentiator. But if too many public relations disasters such as the Target story occur first, we could end-up facing rules and regulations around the use of Big Data imposed on us from the feds. And that, my friends, would really stifle innovation.

So that’s the long and the short of it from last week’s Strata conference. Hadoop is making strides but still has some maturing to do. If you’ll pardon the tortured analogy, Hadoop today is akin to a third-year, second-string NFL quarterback from a Division II school. He came into the league with great raw skills, steadily added to his capabilities by learning to read pro defenses and make solid decisions, but is still a year or more away from taking over the starting job. He’s close, but not there yet. So is Hadoop.

, ,