Big Data is clearly the Next Big Thing for businesses. It enables analysis of huge amounts of unstructured and semi-structured data alongside traditional structured data to answer basic business questions that were unanswerable just a short time ago. Businesses are already using Hadoop and its associated analysis tools to analyze issues ranging from “Why do my customers hate my service and how can I fix that” to “How can I influence prospects to buy my products,” to “What are the concerns of the customer who just walked into my bank.”
However, despite its promise so far Big Data analysis has remained the tool of the rich. SMBs simply do not have the resources to build Big Data systems in house and to attract and retain people with the rare, valuable skills needed to run those systems.
Now, however, a new Platform-as-a-Service company, the Treasure Data Cloud Data Warehouse (http://www.treasure-data.com/), has come out of stealth mode to announce its unique Big-Data-as-a-Service offering. Treasure Data has built a Big Data warehouse system based on Hadoop on top of Amazon using some clever custom software it has developed in-house, some of which it has donated back to the Open Source community. While most Hadoop/Big Data based companies, like CloudEra and Hortonworks, are basically technology companies whose clients are the IT groups of large enterprises, Treasure Data is focused on delivering services directly to business users, says CEO Hiro Yoshikawa. Basically all they need to know is how to use their company's business intelligence tools; Treasure Data handles the technological challenges.
Treasure Data's co-founders Yoshikawa, a Red Hat veteran, and CTO Kazuki Ohta, who founded the largest Hadoop user group in Japan and possibly the world with 1,500 member engineers, have years of experience in Open Source software. One issue they saw immediately was data loading – given the huge amounts of data often involved this takes too long, and is too complex, and Hadoop lacks a data capture and transformation tool. So soon after founding the company in June 2011 they started developing their own, a standard JSON data capture and format transformation tool called Data Fluentd, which they gave to the Open Source project. They then incorporated that into an extended version, TD-Agent, which allows Treasure Data users to transform and upload any kind of data from any kind of data source to any data warehousing system, including Treasure Data. It supports high-performance, parallel batch loading to multiple concurrent targets with a continuous feed that reduces subsequent load times and enables near-real-time or event-based analytics.
Another problem they identified was that Hadoop analysis requires that users learn Map Reduce, a new and complex query system. To fix that, they have developed an SQL layer for Hadoop Hive that translates standard SQL queries into Map Reduce. This allows them to use any standard Business Intelligence (BI) tool just as they would with a RDBMS data warehouse.
And to increase the speed of Hadoop analysis, they have developed a columnar database to replace HDFS. This allows the analytics application to choose the relevant data columns rather than loading an entire database, increasing efficiency.
The result is that while it often takes months for companies to build and test an internal Big Data cluster, install and test Hadoop, and capture and load the data, a company can get a Big Data data warehouse operating and providing answers to business queries in less than a week on Treasure Data. The system is already waiting, fully tested and fully scalable, all the user needs to bring is the data.
Yosikawa says they started Treasure Data with SMBs in mind, but ironically their initial customers are all either large enterprises or online gaming companies whose core business requires Big Data analysis. Because they have just come out of stealth mode, they cannot talk about their customers, but they hint at a large automotive company and a large retailer. They say, however, that they are not working with the central IT groups in these worldwide enterprises but rather with smaller LOBs, divisions or, in the case of the retailer, individual retail outlets. These resemble SMBs in that they do not have the large-scale IT resources to support Big Data analysis in-house but, perhaps because they are parts of large enterprises, understand how they can use Big Data analysis to gain competitive advantage in their markets.
Geographically their clients include companies in Japan, where they have their engineering group and financial backers; the United States, where Treasure Data's headquarters is located; and the European Union, and they are close to closing a deal with a company in Turkey. With the exception of a couple of countries like Iran that are very restrictive about Internet services, Treasure Data is available worldwide.
Treasure Data just came out of stealth mode, but it already seems well tested by its initial users and, from a financial standpoint, already has a thriving business. Its largest challenge may be teaching SMBs how they can gain business advantage from Big Data analysis. As a horizontal service, with no initial focus on specific verticals, it does not yet have a good set of use cases. Its best strategy may be to identify a few verticals with large populations of SMBs where it can make a particularly strong case and where it does not face serious competition, and develop those strong use cases and use them to educate potential customers.
Action Item: SMBs, enterprise LOBs and similar business and governmental organizations should start educating themselves on how they can gain business advantage through Big Data analysis. With the advent of Treasure Data, Hadoop-based Big Data data warehousing is no longer restricted to multinational enterprises. Smaller competitors can achieve the same business advantages at a fraction of the price and compete against the big players.
Footnotes: