Big Data Market Size and Vendor Revenues

From Wikibon

Revision as of 21:16, 23 February 2012 by Jeff (Talk | contribs)
Jump to: navigation, search

By Jeff Kelly with David Vellante and David Floyer


The big data market is on the verge of a rapid growth spurt that will see it top the $50 billion mark worldwide within the next five years.

As of early 2012, the big data market stands at just over $5 billion based on related software, hardware, and services revenue. Increased interest in and awareness of the power of big data and related analytic capabilities to gain competitive advantage and to improve operational efficiencies, coupled with developments in the technologies and services that make big data a practical reality, will result in a super-charged CAGR of 58% between now and 2017.

As explained in our Big Data Manifesto, big data is the new definitive source of competitive advantage across all industries. For those organizations that understand and embrace the new reality of big data, the possibilities for new innovation, improved agility, and increased profitability are nearly endless.

Below is Wikibon’s five-year forecast for the big data market as a whole:

Figure 1 - Source: Wikibon 2012


Of the current market, big data pure-play vendors account for $310 million in revenue. Despite their relatively small percentage of current overall revenue (approximately 5%), these vendors – such as Vertica, Splunk and Cloudera -- are responsible for the vast majority of new innovations and modern approaches to data management and analytics that have emerged over the last several years and made big data the hottest sector in IT.

Wikibon considers big data pure-plays as those independent hardware, software, or services vendors whose big data-related revenue accounts for 50% or more of total revenue. This group also consists of three until-recently independent next-generation data warehouse vendors – HP Vertica, Teradata Aster, and EMC Greenplum – that largely continue to operate as autonomous entities and have not, as of yet, had their DNA polluted by their acquirers.

Below is a worldwide revenue breakdown of the top big data pure-play vendors as of February 2012.*

Figure 2 - Source: Wikibon 2012


Below is a breakdown of market share among the pure-play segment of the big data market.


Figure 3 - Source: Wikibon 2012


The current big data market leaders, by revenue, are IBM, Intel, and HP, these megavendors will face increased competition from established enterprise suppliers as well as the aforementioned big data pure-plays developing big data technologies and use cases that are driving the market. It is incumbent upon Hadoop-focused pure-plays, however, to establish a profitable business model for commercializing the open source framework and related software, which to date has been elusive.

Below is a breakdown of current total big data revenue by vendor**:

Total 2012 Big Data Revenue by Vendor
Vendor Big Data Revenue (in $US millions) Total Revenue (in $US millions) Big Data Revenue as Percentage of Total Revenue
IBM $1,100 $106,000 1%
Intel $765 $54,000 1%
HP $550 $126,000 0%
Oracle $450 $36,000 1%
Teradata $220 $2,200 10%
Fujitsu $185 $50,700 1%
CSC $160 $16,200 1%
Accenture $155 $21,900 0%
Dell $150 $61,000 0%
Seagate $140 $11,600 1%
EMC $140 $19,000 1%
Capgemini $111 $12,100 1%
Hitachi $110 $100,000 0%
Atos S.A. $75 $7,400 1%
Huawei $73 $21,800 0%
Siemens $69 $102,000 0%
Xerox $67 $6,700 1%
Tata Consultancy Services $61 $6,300 1%
SGI $60 $690 9%
Logica $60 $6000 1%
Microsoft $50 $70,000 0%
Splunk $45 $63 68%
1010data $25 $30 83%
MarkLogic $20 $80 25%
Cloudera $18 $18 100%
Red Hat $18 $1,100 2%
Informatica $17 $750 2%
1010data $25 $30 50%
SAS Institute $15 $2,700 1%
Amazon Web Services $14 $650 2%
ClickFox $11 $35 31%
Super Micro $11 $540 2%
SAP $10 $17,000 0%
Think Big Analytics $8 $12 167%
MapR $7 $7 100%
Digital Reasoning $6 $12 50%
Pervasive Software $5 $50 10%
Hortonworks $3 $3 100%
DataStax $3 $3 100%
Attivio $2.5 $19 13%
QlikTech $2 $300 1%
HPCC Systems $2 $2 100%
Datameer $2 $2 100%
Karmasphere $2 $2 100%
Tableau Software $1.5 $72 2%
NetApp $1.5 $5,000 0%
Other $25 n/a n/a%
Total $5,051 $866,070 1%


Wikibon initiated this research in an effort to provide some guidance to the community on the size of the market. Everyone is buzzing about big data, which begs the question: "How big is the big data market?" We searched but were unable to find any market size information and felt that putting forth a tops/down and bottoms/up analysis would be useful. Putting a 'stake in the ground' on the market size will also, we hope, generate further discussion in the community and help us fine-tune the market estimates. All credible input will be assessed and acted upon quickly.

Regarding methodology, the big data market size, forecast, and related market-share data was determined based on extensive research of public revenue figures, media reports, interviews with vendors and resellers regarding customer pipelines, product roadmaps, and feedback from the Wikibon community of IT practitioners. Many vendors were not able or willing to provide exact figures for our big data definition, and because many of the pure plays are privately held it was necessary for Wikibon to triangulate many sources of information to determine our final figures. Wikibon defines big data to include data sets whose size and type make them impractical to process and analyze with traditional database technologies and related tools. The big data market, therefore, includes those technologies, tools, and services designed to address these shortcomings. These include:

  • Hadoop distributions, software, subprojects and related hardware;
  • Next-generation data warehouses and related hardware;
  • Data integration tools and platforms as applied to big data;
  • Big data analytic platforms, applications, and data visualization tools;
  • Big data support, training, and professional services.

While this is an admittedly broad market definition, most core big data technologies and tools share some combination of the following characteristics. They take advantage of commodity hardware to enable scale-out, parallel processing techniques; employ some level of non-relational and/or columnar data storage capabilities in order to process unstructured and semi-structured data; and apply advanced analytics and data visualization technology to convey insights to end-users.

Below is a breakdown of big data revenue by hardware, software, and services.

Figure 4 - Source: Wikibon 2012


Pure-plays Delivering Big Data Innovation

While IT heavyweights IBM and Intel currently lead the big data market in overall revenue, this is mainly due to their breadth of offerings and entrenchment in many enterprise data centers, and, in the case of Intel, the propensity of big data projects to use commodity x/86 servers. As well, IBM's emphasis on analytics and its huge services portfolio are driving much of the company's big data revenue. Moreover, the market is immature, with smaller big data pure-plays just ramping up their go-to-market strategies.

The most impactful innovations in the big data market are in fact coming from the numerous pure-play vendors that, as of now, own just a small share of the overall market. While not all will succeed in the long term, and some have yet to deliver any significant revenue, Wikibon expects many of these vendors to enjoy rapid growth over the next five years as their offerings, support services, and sales channels mature. Of course, this also means each and every big data pure-play is a potential acquisition target of megavendors IBM, Oracle, HP, EMC, and others. As has happened in other fast-growing markets, such as the business intelligence market in 2007-2008, the big data market will experience significant consolidation within the next three-to-five years. The acquiring vendors would be wise to allow current big data pure-plays to continue operating and, more importantly, innovating as largely independent entities, or risk stifling the very innovation that is fueling the big data market’s tremendous growth.

Below are specific examples of the innovations being driven by big data pure-plays:

Hadoop distributions
Cloudera and Hortonworks are responsible for the majority of contributions to the Apache Hadoop project that are significantly improving the open source big data framework’s performance capabilities and enterprise-readiness.

Cloudera, for example, contributes significantly to Apache HBase, the Hadoop-based non-relational database that allows for low-latency, quick lookups. The latest of these iterations, to which Cloudera engineers contributed, is HFile v2, a series of patches that improve HBase storage efficiency.

Hortonworks engineers are working on a next-generation MapReduce architecture that promises to increase the maximum Hadoop cluster size beyond its current practical limitation of 4,000 nodes.

MapR takes a more proprietary approach to Hadoop, supplementing HDFS with its API-compatible Direct Access NFS in its enterprise Hadoop distribution, adding significant performance capabilities.

Next Generation Data Warehousing
The three leading, until recently independent next-generation data warehouse vendors – Vertica, Greenplum, and Aster Data – are upending the traditional enterprise data warehouse market with massively parallel, columnar analytic databases that deliver lightening fast data loading and real-time analytic capabilities.

The latest iteration of the Vertica Analytic Platform, Vertica 5.0, for example, includes new elasticity capabilities to easily expand or contract deployments and a slew of new in-database analytic functions.

Aster Data has pioneered a novel SQL-MapReduce framework, combining the best of both data processing approaches, while Greenplum’s unique collaborative analytic platform, Chorus, provides a social environment for data scientists to experiment with big data.

All three vendors experienced significant revenue growth over the last two-to-three years, with Vertica leading the way with an estimated $84 million in revenue in 2011, followed by Aster Data with $50 million, and Greenplum with $40 million.

Big Data Analytics Platforms and Applications
A handful of up-and-coming vendors are developing applications and platforms that leverage the underlying Hadoop infrastructure to provide both data scientists and “regular” business users with easy-to-use tools for experimenting with big data.

These include Datameer, which has developed a Hadoop-based business intelligence platform with a familiar spreadsheet-like interface; Karmasphere, whose platform allows data scientists to perform ad hoc queries on Hadoop-based data via a SQL interface; and Digital Reasoning, whose Synthesis platform sits on top of Hadoop to analyze text-based communication.

Big-Data-as-a-Service
Big-Data-as-a-Service is developing rapidly thanks to vendors such as Tresata, 1010data and ClickFox. Cloud-based applications and services are increasingly allowing small and mid-sized business to take advantage of big data without needing to deploy on-premise hardware or software.

Tresata’s cloud-based platform, for example, leverages Hadoop to process and analyze large volumes of financial data and returns results via on-demand visualizations for banks, financial data companies, and other financial services companies.

1010data offers a cloud-based application that allows business users and analysts to manipulate data in the familiar spreadsheet format but at big data scale. And the ClickFox platform mines large volumes of customer touch-point data to map the total customer experience with visuals and analytics delivered on-demand.

Non-Hadoop Big Data Platforms
Other non-Hadoop vendors contributing significant innovation to the big data landscape include:

  • Splunk, which specializes in processing and analyzing log file data to allow administrators to monitor IT infrastructure performance and identify bottlenecks and other disruptions to service;
  • HPCC Systems, a spin-off of LexisNexis, that offers a competing big data

Personal tools