Hadoop will be at least as disruptive to the IT industry as Linux has been, says Abhishek Mehta, managing director for big data and analytics for Bank of America. And one major reason for that is that it will allow business to solve what until now have often been regarded as insoluble problems.
“As a banker, I now can end fraud,” he told Wikibon Co-Founder David Vellante and SiliconAngle Founder John Furrier in an interview from Hadoopworld on Siliconangle.tv. “Think big. Throw out the assumption that the big problems – eliminating fraud, mapping the spread of disease, understanding the traffic system, optimizing the energy grid – are unsolvable. They can be solved now.”
Those and similar breakthroughs will be built on the ability to analyze huge amounts of data, almost all of it unstructured, that can be captured in Hadoop-based technology. That will become the basis for a second industrial revolution based on the data factory, he says. And one major reason that they can be solved is that Hadoop technology allows models to be built based on analysis of an entire universe of data rather than a subset. “Sampling in finished. As a bank I can think about eliminating fraud because I can build a model looking at every incidence of fraud going back five years for every single person, rather than sampling the data, building a model, realizing there is an outlier that breaks the model, and then rebuilding the model. Those days are over.”
Mr. Mehta, who spent six months earlier this year in Silicon Valley, taking a deep dive into the operations of four of the pioneers of this new data-based economy – Facebook, Yahoo!, Twitter and gaming company Zinga – says these companies provide the model for these new data factories. “The U.S. has nothing to worry about,” he says. “The data factories are happening here.”
Data factories will drive the new economy, he says, because data is central to business. “Wal*Mart for me is a data company. Bank of America is a data company. And we are all technology companies. We may not talk that way, but we are technology companies.”
Data factories are built on three core concepts, he says: 1. You have to believe that your core asset is data. 2. You have to be able to automate the data pipeline. 3. You need to know how to monetize your data assets.
Hadoop is the big game changer, the foundation on which data factories will be built, he says. “it's a massive game changer. Google processes a terabyte an hour. Processes it. I can't even store it.”
This is vital because the new problems we face often involve huge amounts of data. The human genome project, for instance, involves the analysis of three billion base pairs. The first human genome took 10 years. The second one took three. “Today, you can do it in a week.”
Today six billion mobile phones are operating worldwide. They are access points to massive amounts of data and services in the cloud. Mr. Mehta envisions data factories becoming clouds in themselves, so Facebook is evolving into a large social cloud, the telecom companies are becoming communications clouds, and large financial services companies like Bank of America are becoming financial services clouds.
“Does mobile banking change the way people do payments today?” he asks. “The answer is yes. What model emerges remains to be seen. It's not as simple as saying that a phone can be an access point for players.” And tracking how that evolves, how different players find places in the mobile payments infrastructure, and just keeping track of the billions of transactions that will originate from those cell phones to ensure that they are completed correctly, will require the capture and processing and analysis of huge amounts of data.
The ultimate example is the large hadron collider in Switzerland, which, when it is running, produces 40 Tbytes of data a second, he says. “Holy cow, how do you store that?”