The Big Data community has been waiting in anticipation for Ben Werther’s start-up Platfora to come out of stealth mode and reveal its grand vision since early summer. Well, that day has come and Werther’s vision for Platfora is indeed ambitious.
Platfora today announced it raised $5.7 million in Series A funding led by Anderseen Horowitz, with additional support from In-Q-Tel. In an accompanying blog post, Werther said Platfora has developed a platform to allow business users to interactively explore large data sets stored on Hadoop and create multidimensional, predictive dashboards and reports.
Currently, manipulating and analyzing Hadoop-based data requires significant expertise in mathematics, statistics and distributed computing. In most production environments, this means highly skilled data scientists toil away on complex analytic platforms that sit atop Hadoop to produce meaningful analysis, analysis which – hopefully – trickles down to business users in one form or another. In some scenarios, Hadoop-based analysis is ported to traditional data warehouses where business users can explore it with business intelligence (BI) and data visualization tools.
Werther further explains how Platfora works:
Platfora’s breakthrough is a combination of server technology, user experience innovation, and data science. Our platform works with existing Hadoop clusters (Cloudera, MapR, Amazon EMR, etc.), and automatically turns the questions of business users into Hadoop jobs that synthesize and distill Hadoop datasets into dimensional and predictive dashboards, reports and insights. The system intelligently drives Hadoop to create and maintain ‘work products’ — highly compressed partial results that are refined at the click of a button to achieve sub-second report delivery, analytics overlay, and drilldown performance.
Though he doesn’t say so explicitly, it appears that Werther is implying that Platfora’s platform takes the place of data scientists within the enterprise as described in the above scenario. (Interestingly, Werther was formerly Director of Product Strategy at EMC Greenplum, which has been pushing the role of the data scientist, even hosting a Data Scientist Summit in May.) Not only that, but Platfora’s technology also removes the need for traditional data warehouses, ETL tools and BI applications, according to Werther. Instead of porting Hadoop analysis to a data warehouse and BI environment, the idea is that an organization would store all its analytic data inside Hadoop and business users of all stripes could analyze it with Platfora’s technology.
Platfora’s overarching goal, as I see it, is to do nothing less than up-end the current Big Data analytics model by removing the middleman known as the data scientist and simplify the process of deriving insights from Hadoop to the point that any business user can do so unassisted.
All this sounds wonderful, like a Big Data Utopia: A single, scalable, inexpensive environment to store all of your structured and unstructured data (Hadoop) and a single, elegant, powerful, easy-to-use platform from which users of all experience and skill levels can analyze all that data (Platfora).
I like the vision in theory, but I have doubts about its practicality. Specifically:
- Deploying and managing Hadoop clusters is a complex affair. While Platfora aims to eliminate the need for sophisticated quants to run the analytics, organizations will still need experienced and well-trained admins to run the underlying Hadoop clusters. There is a well-known dearth of such workers currently on the job market and most organizations lack the internal talent needed to get production-level Hadoop deployments off he ground today. This will improve with time as current DBA’s ramp up their Hadoop skills, but for now it would seem Platfora’s current potential customer base is rather limited.
- Well-entrenched data warehouse and BI vendors view themselves as complimentary to Hadoop, not mutually exclusive, and aren’t going down without a fight. Their technologies are critical parts of IT infrastructures at many organizations, and most are developing their own Big Data strategies. Most notably, EMC Greenplum has its own Hadoop distribution tightly integrated with its data warehouse platform to allow data to flow between the two. Other data warehouse and data integration vendors have built Hadoop connectors, as has Cloudera. Platfora needs prove to the industry that its model is more effective.
- Before business users can effectively analyze data they must understand how data relate to each other. This is hard enough when dealing with a few hundred gigabytes of structured data in a traditional data warehouse, let alone terabytes or petabytes of unstructured (and largely unfamiliar to business users) data in Hadoop. Platfora seems to address this issue with pre-defined analytic functions built into the platform, but it will be difficult to cover all possible analytics scenarios. Business users will be tempted to experiment with new types of analysis and may quickly find themselves outside their pay grades.
- Finally, Platfora’s vision for organizations to store all of their analytic data inside Hadoop will be difficult to achieve for cultural and sometimes legal reasons. As happens in traditional data warehousing and BI projects, some stakeholders are reluctant to relinquish “ownership” of their data and turn it over to a central repository or, in this case, a Hadoop-based file store or NoSQL database. In some cases, compliance regulations curtail how certain data can be used, who can view it, and where it lives. In these cases, traditional, siloed data warehouse environments are still needed. Hadoop’s security capabilities are also still works-in-progress.
Platfora is without question a vendor worth watching, but I think its goal might be overly ambitious. At the very least, Werther has set the bar for success extremely high. Hadoop is still an immature and rapidly developing software framework and I think an iterative approach towards Big Data analytics is the more prudent model. If Platfora can overcome these obstacles and achieve its vision before competing Big Data analytic and visualization approaches take hold, it has a chance to conquer the Big Data universe. But Werther and his team are up against the clock.