In the IT world, there are few ideas that are truly revolutionary. Even the big waves of the Internet and latest mobile craze started with sharing text, added pictures, then music and ultimately video. VMware’s CEO Paul Maritz said that anyone over 40 could think of cloud computing as a “software mainframe”. One of the exciting things about technology is when an inflection point is reached that allows for great advancements and new uses of old ideas (see the history of Twitter as an example). Cloudera launched Hadoop 2009 and the first mentions of the “big data” catch phrase that I can find are in 2009/2010. Wikibon has looked closely at how the new solutions are different from traditional data warehousing (see David Floyer’s definition and other free research on the Wikibon site). The explosion of data has been building for many years and while data scientists that can help turn data into usable information have been around for decades, they are coming to the forefront with the big data inflection point. EMC is looking to use marketing and financial muscle to stake a leadership position in the big data ecosystem, including an initiate to certify data scientists. While there is a need to grow the pool, true data scientists leverage a mix of math, science and hacking, and EMC is not a hacker company.
Wikibon’s Jeff Kelly has been following the BI/data analytics space for a number of years and wrote an excellent piece analyzing EMC’s move into selling and supporting Hadoop. EMC knows that it has to build authority in the Open Source community – EMC President Pat Gelsinger discusses in the video below how the company will use acquisitions of companies and people to build credibility:
Shevek is himself a data scientist and at EMC World, he was telling me about research and papers that he wrote over 15 years ago that are now coming to fruition. EMC announced that it will be “certifying” data scientists. While EMC is a long time resident of the information universe and acquisitions of Greenplum and Isilon have given it a strong standing in the big data ring, does it have the chops to create new generation of Data Scientists? Hilary Mason, Chief Scientist of bit.ly gave a great explanation of what a data scientist is:
I see data science as a combination of maths, computer science so you can code things that actually function, statistics, and finally just hacking. And I think that last one is by far the most important. If you’re the kind of person who can say, “I have some cool data. I really am curious about some questions about that data. I’m going to figure this out.” Then yes, you can do it.
EMC garnered some good buzz in the space by holding a Data Scientist Summit. While EMC has the knowledge and industry position to create training for the new field of cloud architects, it must build credibility in the open source and hacking communities for a data scientist certification to be more than good marketing. The money and focus of enterprise vendors on big data – including IBM who has a broad portfolio in the space including data scientists from Cognos and big data assets with Netezza and HP is going into the enterprise big data space with Vertica – has the opportunity to accelerate the innovation and growth in the space. Companies should look to understand how new tools and data scientists can help create value and be sure to understand the value of data scientists is much more than being called one.