Big Data Market Size and Vendor Revenues

From Wikibon

Jump to: navigation, search

By Jeff Kelly with David Vellante and David Floyer

This is the 2011 report, originally published on February 15, 2012. See Big Data Vendor Revenue and Market Forecast 2012-2017 for the 2012 update.

The Big Data market is on the verge of a rapid growth spurt that will see it top the $50 billion mark worldwide within the next five years.

As of early 2012, the Big Data market stands at just over $5 billion based on related software, hardware, and services revenue. Increased interest in and awareness of the power of Big Data and related analytic capabilities to gain competitive advantage and to improve operational efficiencies, coupled with developments in the technologies and services that make Big Data a practical reality, will result in a super-charged CAGR of 58% between now and 2016.

As explained in our Big Data Manifesto, Big Data is the new definitive source of competitive advantage across all industries. For those organizations that understand and embrace the new reality of Big Data, the possibilities for new innovation, improved agility, and increased profitability are nearly endless.

Below is Wikibon’s five-year forecast for the Big Data market as a whole:

Figure 1 - Source: Wikibon 2012

Of the current market, Big Data pure-play vendors account for $480 million in revenue. Despite their relatively small percentage of current overall revenue (approximately 10%), these vendors – such as Vertica, Splunk and Cloudera -- are responsible for the vast majority of new innovations and modern approaches to data management and analytics that have emerged over the last several years and made Big Data the hottest sector in IT.

Wikibon considers Big Data pure-plays as those independent hardware, software, or services vendors whose Big Data-related revenue accounts for 50% or more of total revenue. This group also consists of three until-recently independent next-generation data warehouse vendors – HP Vertica, Teradata Aster, and EMC Greenplum – that largely continue to operate as autonomous entities and have not, as of yet, had their DNA polluted by their acquirers.

Below is a worldwide revenue breakdown of the top Big Data pure-play vendors for 2011.*

Figure 2 - Source: Wikibon 2012

Below is a breakdown of market share among the pure-play segment of the Big Data market.

Figure 3 - Source: Wikibon 2012

The current Big Data market leaders, by revenue, are IBM, Intel, and HP, these megavendors will face increased competition from established enterprise suppliers as well as the aforementioned Big Data pure-plays developing Big Data technologies and use cases that are driving the market. It is incumbent upon Hadoop-focused pure-plays, however, to establish a profitable business model for commercializing the open source framework and related software, which to date has been elusive.

Below is a breakdown of current total Big Data revenue by vendor**:

Total 2011 Big Data Revenue by Vendor
Vendor Big Data Revenue (in $US millions) Total Revenue (in $US millions) Big Data Revenue as Percentage of Total Revenue
IBM $953 $106,000 1%
Intel $765 $54,000 1%
HP $513 $126,000 0%
Fujitsu $285 $50,700 1%
Accenture $273 $21,900 0%
CSC $160 $16,200 1%
Dell $154 $61,000 0%
Seagate $149 $11,600 1%
EMC $138 $19,000 1%
Teradata $120 $2,200 5%
Amazon Web Services $116 $650 18%
SAS Institute $115 $2,700 1%
Capgemini $111 $12,100 1%
Hitachi $110 $100,000 0%
SAP $85 $17,000 0%
Opera Solutions $76 $100 76%
NetApp $75 $5,000 0%
Atos S.A. $75 $7,400 1%
Huawei $73 $21,800 0%
Siemens $69 $102,000 0%
Xerox $67 $6,700 1%
Tata Consultancy Services $61 $6,300 1%
SGI $60 $690 9%
Logica $60 $6000 1%
Mu Sigma $55 $65 85%
Microsoft $50 $70,000 0%
Oracle $50 $36,000 0%
Splunk $45 $63 68%
1010data $25 $30 83%
Supermicro $23 $943 2%
MarkLogic $20 $80 25%
Cloudera $18 $18 100%
Red Hat $18 $1,100 2%
Informatica $17 $750 2%
Calpont $15 $25 60%
ClickFox $11 $35 31%
Fractal Analytics $12 $12 100%
Pervasive Software $10 $50 20%
Tableau Software $10 $72 14%
Think Big Analytics $8 $8 100%
MapR $7 $7 100%
Digital Reasoning $6 $6 100%
ParAccel $5 $11 45%
Couchbase $5 $6 84%
DataStax $4.5 $4.5 100%
10gen $4.5 $4.5 100%
Datameer $4 $4 100%
Hortonworks $3 $3 100%
RainStor $2.5 $2.5 100%
Attivio $2.5 $19 13%
QlikTech $2 $300 1%
HPCC Systems $2 $2 100%
Karmasphere $2 $2 100%
Other $25 n/a n/a%
Total $5,125 $866,671 1%

Notes on the above table. There have been several questions from the community on this data, how it was derived and why certain vendors were quantified as they were. The following capture some of the highlights of these discussions.

  • Intel, Seagate and Super Micro have large shares due to the propensity for big data scale out clusters to use off-the-shelf components and white box solutions. This is especially true for Internet giants like Google and Facebook.
  • IBM's strong showing includes a large proportion of services due to the company's strong professional services portfolio. As well, IBM's strong analytics software business is a strong contributor to its Big Data initiatives.
  • Oracle's figures include Exadata and Exalogic because they are non-traditional approaches to handling large data. However, not all the revenue from these products is included. We estimated those revenue that were associated with deployments that were large in capacity.
  • Next generation enterprise data warehouse vendor revenue from Vertica, Greenplum, and Aster Data was included in their parent company's overall Big Data revenue numbers.

Wikibon initiated this research in an effort to provide some guidance to the community on the size of the Big Data market. Everyone is buzzing about Big Data, which begs the question: "How big is the Big Data market?" We searched but were unable to find any market size information and felt that putting forth a tops/down and bottoms/up analysis would be useful. Putting a 'stake in the ground' on the market size will also, we hope, generate further discussion in the community and help us fine-tune the market estimates. All credible input will be assessed and acted upon quickly.

Regarding methodology, the Big Data market size, forecast, and related market-share data was determined based on extensive research of public revenue figures, media reports, interviews with vendors and resellers regarding customer pipelines, product roadmaps, and feedback from the Wikibon community of IT practitioners. Many vendors were not able or willing to provide exact figures for our Big Data definition, and because many of the pure-plays are privately held it was necessary for Wikibon to triangulate many sources of information to determine our final figures. Wikibon defines big data to include data sets whose size and type make them impractical to process and analyze with traditional database technologies and related tools. The big data market, therefore, includes those technologies, tools, and services designed to address these shortcomings. These include:

  • Hadoop distributions, software, subprojects and related hardware;
  • Next-generation data warehouses and related hardware;
  • Big data analytic platforms and applications;
  • Business intelligence, data mining and data visualization platforms and applications as applied to Big Data;
  • Data integration platforms and tools as applied to Big Data;
  • Big Data support, training, and professional services.

While this is an admittedly broad market definition, most core Big Data technologies and tools share some combination of the following characteristics. They take advantage of commodity hardware to enable scale-out, parallel processing techniques; employ some level of non-relational data model in order to process unstructured and semi-structured data; take advantage of columnar data storage and/or data compression capabilities to improve query efficiency; and are interoperable with business analytics and data visualization technologies to convey insights to end-users.

Below is a breakdown of Big Data revenue by hardware, software, and services.

Figure 4 - Source: Wikibon 2012

Pure-plays Delivering Big Data Innovation

While IT heavyweights IBM and Intel currently lead the Big Data market in overall revenue, this is mainly due to their breadth of offerings and entrenchment in many enterprise data centers, and, in the case of Intel, the propensity of Big Data projects to use commodity x/86 servers. As well, IBM's emphasis on analytics and its huge services portfolio are driving much of the company's big data revenue. Moreover, the market is immature, with smaller Big Data pure-plays just ramping up their go-to-market strategies.

The most impactful innovations in the Big Data market are in fact coming from the numerous pure-play vendors that, as of now, own just a small share of the overall market. While not all will succeed in the long term, and some have yet to deliver any significant revenue, Wikibon expects many of these vendors to enjoy rapid growth over the next five years as their offerings, support services, and sales channels mature. Of course, this also means each and every Big Data pure-play is a potential acquisition target of megavendors IBM, Oracle, HP, EMC, and others. As has happened in other fast-growing markets, such as the business intelligence market in 2007-2008, the Big Data market will experience significant consolidation within the next three-to-five years. The acquiring vendors would be wise to allow current Big Data pure-plays to continue operating and, more importantly, innovating as largely independent entities, or risk stifling the very innovation that is fueling the big data market’s tremendous growth.

Below are specific examples of the innovations being driven by Big Data pure-plays:

Hadoop distributions Cloudera and Hortonworks are responsible for the majority of contributions to the Apache Hadoop project that are significantly improving the open source Big Data framework’s performance capabilities and enterprise-readiness.

Cloudera, for example, contributes significantly to Apache HBase, the Hadoop-based non-relational database that allows for low-latency, quick lookups. The latest of these iterations, to which Cloudera engineers contributed, is HFile v2, a series of patches that improve HBase storage efficiency.

Hortonworks engineers are working on a Next Generation MapReduce architecture that promises to increase the maximum Hadoop cluster size beyond its current practical limitation of 4,000 nodes, as well as add some level of real-time streaming data analysis capabilities.

MapR takes a more proprietary approach to Hadoop, supplementing HDFS with its API-compatible DirectAccess NFS in its enterprise Hadoop distribution, adding significant performance and uptime capabilities.

Next Generation Data Warehousing The three leading, until recently independent Next Generation Data Warehouse vendors – Vertica, Greenplum, and Aster Data – are upending the traditional enterprise data warehouse market with massively parallel, columnar analytic databases that deliver lightening fast data loading and near real-time query capabilities.

The latest iteration of the Vertica Analytic Platform, Vertica 5.0, for example, includes new elasticity capabilities to easily expand or contract deployments and a slew of new in-database analytic functions.

Aster Data has pioneered a novel SQL-MapReduce framework, combining the best of both data processing approaches, while Greenplum’s unique collaborative analytic platform, Chorus, provides a social environment for Data Scientists to experiment with Big Data.

All three vendors experienced significant revenue growth over the last two-to-three years, with Vertica leading the way with an estimated $84 million in revenue in 2011, followed by Aster Data with $52 million, and Greenplum with $40 million.

Big Data Analytic Platforms and Applications A handful of up-and-coming vendors are developing applications and platforms that leverage the underlying Hadoop infrastructure to provide both Data Scientists and “regular” business users with easy-to-use tools for experimenting with Big Data. Less mature is the market for polished end-user Big Data applications.

Datameer is gaining significant traction with its Hadoop-based business intelligence platform that leverages a familiar spreadsheet-like interface to allow non-power users to manipulate or otherwise analyze Hadoop-based data; Digital Reasoning, whose Synthesis platform sits on top of Hadoop to analyze text-based communication, is well entrenched in the government sector and is poised to expand to more traditional enterprises. Karmasphere has developed an analytic development platform that allows Data Scientists to perform ad hoc queries on Hadoop-based data via a SQL interface.

Big-Data-as-a-Service Big-Data-as-a-Service is developing rapidly thanks to vendors such as Tresata, 1010data and ClickFox. Cloud-based Big Data applications and services have the potential to allow small and mid-sized business, as well as enterprises that lack internal Big Data expertise, to take advantage of Big Data processing and analytic capabilities without needing to deploy and manage on-premise hardware or software.

Tresata’s cloud-based platform, for example, leverages Hadoop to process and analyze large volumes of its customers financial data, including enriching it with third-party data such as stock market data, and returns results via on-demand visualizations for banks, financial data companies, and other financial services companies.

1010data offers a cloud-based application that allows business users and analysts to manipulate data in the familiar spreadsheet format but at Big Data scale. And the ClickFox platform mines large volumes of customer touch-point data to map the total customer experience with visuals and analytics delivered on-demand.

Non-Hadoop Big Data Platforms Other non-Hadoop vendors contributing significant innovation to the big data landscape include:

  • Splunk, which specializes in processing and analyzing log file data to allow administrators to monitor IT infrastructure performance and identify bottlenecks and other disruptions to service;
  • HPCC Systems, a spin-off of LexisNexis, that offers a competing Big Data framework to Hadoop that its engineers built internally over the last ten years to assist the company in processing and analyzing large volumes of data for its clients in finance, utilities and government;
  • DataStax, which offers a commercial version of the open source Apache Cassandra NoSQL database along with related support services bundled with Hadoop.

There are of course many other promising Big Data start-ups that are too early in their existence to qualify for inclusion in this revenue report but which none-the-less are off to promising starts. Among them are Sqrrl, Aerospike, Platfora, Continuity, Hadapt, and Wibidata. Enterprises should keep a close eye on these and other big data pure-plays as they continue to develop innovative but practical Big Data platforms, applications and services.

Action Item: The Big Data market is exploding, not only in terms of marketing hype but also in real revenue. While reasonable people can debate definitions and overall market sizes, one thing is clear - Big Data is a large and fast growing market. For IT practitioners it means investigating ways in which you can monetize data sources at your organizations and obtaining the skills necessary to achieve that objective. For the vendor community it means you need to have a story around Big Data that is credible with a roadmap that offers clear business value and flexibility to move with this fast-growing space.

Footnotes: Note: All revenue figures are worldwide.
* Figures exclude non-Big Data revenue.
* * All revenue figures are estimates based on the above criteria, with the exception of total revenue of public companies, which are a matter of public record.

Wikibon Feb 22, 2012 Press Release for this report (originally posted Feb 10, 2012).

Jeff Kelly is a Principal Research Contributor at He focuses on trends in Big Data and Business Analytics technologies. Reach Jeff by email at or Twitter at @jeffreyfkelly.

Personal tools