Cloud and Big Data are two terms frequently mentioned in the same breath. But the reality is the vast majority of Big Data deployments, including Hadoop deployments, are behind the firewall in bare metal environments.
As Wikibon has written before, the public cloud is a natural environment for Big Data analytics proofs-of-concept projects, such as exploratory analytics against cloud-based data sources and building/testing new Big Data applications. The leading public cloud provider, Amazon Web Services, has invested heavily in its Big Data services to attract such projects. This investment includes the development of Kinesis, a Big Data streaming analytics framework, and RedShift, a highly scalable data warehouse-as-a-service offering.
Not to be outdone, Google this week announced significant price cuts to its flagship Big Data analytics service Big Query, to which AWS quickly responded with price cuts of its own, and a new streaming analytics service to compete with Kinesis. Both moves are welcome additions for Big Data practitioners. It is critical for a competitive environment of multiple cloud providers for Big Data services to emerge to avoid giving one provider (AWS) lock-in potential.
But for Big Data in the public cloud to gain traction, deployments must not be limited to experimental projects. That means cloud service providers must improve their security and compliance capabilities, as Big Data deployments almost by definition involve the storing, processing and analysis of sensitive data sets. In addition, cloud service providers need to do a better job of providing end-to-end services for Big Data projects, from data ingestion through visualization.
This is also an opportunity for Big Data vendors to provide complete Big Data managed services from multiple clouds. One vendor already taking this approach is Treasure Data, which offers a cloud-based, end-to-end managed service for large-scale data warehousing. This includes enabling streaming data ingestion from on-premise data centers, data processing, storage and analysis, data visualization and business intelligence (via connections to popular providers such as Tableau Software), to moving data back to corporate data centers (if desired.)
Such services remove major bottlenecks to successful Big Data deployments, including a lack of skilled practitioners to provision and maintain complex Big Data platforms and technologies. They also allow enterprises to invest more of their time in executing their core business and less on running technology and reduce time-to-insight for Big Data analytics projects.
But the market currently has few other such managed service providers. In addition to large-scale data warehousing, Wikibon sees a major opportunity for fully managed services for other Big Data functions, including data science/advanced analytics and Big Data application development.
Action Item: Big Data practitioners struggling to get projects off the ground due to challenges such as provisioning hardware, tuning systems and maintaining performance should explore cloud services such as those offered by AWS, Google and Treasure Data, as well as larger enterprise players such as IBM/Softlayer that offer bare-metal services. Special attention should be paid to security and privacy capabilities, as well as the comprehensiveness of the services provided
Footnotes: