Abhi Mehta thinks the world is on the verge of a second industrial revolution. But this time, the raw materials aren't steel and iron but data and information.
Mehta is the founder of Tresata, a start-up that applies the assembly line model of the early 20th century to the huge volumes of data being created both inside the enterprise and in the cloud here in the 21st century. The goal is to create “data factories” that will help supply organizations with timely, accurate data that can be used to solve pressing business problems.
“Data is by far the core purveyor or keeper of value,” said Mehta. “That's a reality that's becoming very, very clear.”
But in order for organizations to unlock the value of big data – click-stream data, Tweets, product data, and more – they have to find a way to access, clean, and deliver it first. Mehta has developed a three-step process, what he calls automating the data pipeline, that he says will do just that.
First, Tresata works with clients to identify the data needed to solve particular business problems, then collects and indexes the data as part of a massive “storage dump.” Then Tresata runs the data through its data assembly line, which Mehta says relies on custom-built algorithms to identify and resolve anomalies and other data quality issues on large volumes of data. Finally, the data is made available to data scientists and others for analysis with the tools of their choice.
At first blush, it doesn't sound that different from what data integration and data quality vendors have been helping companies do for years. But the sheer volume and unstructured nature of data today demands a new approach, Mehta says. The old data management way of collecting and scrubbing data just won't do in the big data era.
So Mehta approaches data management in the age of big data like Henry Ford approached building cars. “We take a manufacturing approach,” Mehta says.
Tresata's initial focus is on applying the data pipeline to the financial services industry, not surprising since Mehta spent a number of years as Bank of America's big data guru. One application of the pipeline is to help financial services firms identify up-sell opportunities. Tresata will help banks, for example, collect, index, clean, and deliver data needed to optimally price products and match them with the right customers to maximize profit.
Mehta says Tresata's approach –- which he hopes to apply to retail and healthcare clients as well as financial services firms -- will help companies understand customers on an individual basis based on the plethora of social media, transactional, and other data that can now be stored in the cloud thanks to Hadoop and other methods. “No two people are identical,” Mehta said. “The ability to look at things on an individual level is game changing.”
The real differentiator for Tresata, in my view, is the second step in its data assembly line process – cleaning the data. Data quality has been a thorn in the side of database administrators and others for years. Despite the many data quality tools that have been developed, bad data is still one of the leading reasons data warehousing and other analytic initiatives fail. And that's in scenarios with relatively small amounts of structured data.
If Mehta and Tresata have figured out a way to effectively clean large volumes of mostly unstructured data, they could be on to something big.
Action Item: Financial, retail, and other organizations that serve large and varied customer bases should waste no time in considering ways to harness big data to find new revenue opportunities. Those that don't will quickly fall behind. As the era of big data progresses, new approaches to managing and utilizing large volumes of distributed, unstructured data will arise. Tresata's unique approach is a promising start that companies looking to harness the potential of big data should consider.