Introduction
To derive business value from Big Data Analytics, practitioners must incorporate unstructured data & content into the equation. This was the premise put forth by Brigham Hyde of Relay Technology Management to the Wikibon community at the July 10th, 2012 Peer Incite.
Why is unstructured data & content required to derive business value from Big Data?
Video Clip (5:01) – Three Ways Unstructured Data & Content Create Business Value
There is much debate over the definition of Big Data. Some approach it from a tools and technology perspective, while others prefer the “three V’s” definition. However you define it, the end goal of any Big Data project is to deliver business value.
But where exactly does the value in Big Data lie? In a recent Wikibon Peer Incite call, Relay Technology Management’s Dr. Brigham Hyde zeroed in on the answer. Deriving value from Big Data, said Hyde, requires unifying unstructured content & data with structured data in a way that allows end-users to gain deeper insights than possible with analysis of structured data alone.
It is an accepted fact that more data beats smarter algorithms when it comes to analytics. We also know that more than 80% of the world’s data is unstructured or semi-structured. Therefore, to discover truly game-changing insights via analytics, unstructured and multi-structured content & data must be included in the underlying corpus.
What new tools and/or approaches are needed to harness unstructured data & content?
Video Clip (1:30) – Why Relational DBs Insufficient for Big Data Analytics
Video Clip (1:15) – Death by Manual Data Curation and Spreadsheets
Traditional relational databases and related business intelligence tools are simply not up to the task of processing and analyzing large volumes of unstructured data & content in a time-efficient or cost-effective way. Nor are manual efforts – namely collecting data from various data repositories in multiple silo’ed spreadsheets – adequate for harnessing the value of Big Data.
Therefore, new types of technologies and tools are needed. Specifically, Big Data requires emerging technologies, such as MPP analytic databases, Hadoop and advanced analytic platforms, to meet the volume, velocity, and analytic complexity requirements associated with unstructured data.
There’s more to the equation, however. In some industries such as life sciences and financial services, it is not enough to simply make unstructured content & data analysis available to business analysts and data scientists alongside traditional, structured data analysis. Rather, unstructured content & data must be unified with structured data sources by a common ontological layer that allows users to understand and visualize important correlations between multiple data types.
What are Ontologies and what role do they play in Big Data analytics?
Video Clip (1:22) – Ontology Defined and Importance to Big Data Analytics
At Relay Technology Management, Hyde and his team have developed a SaaS-based application suite that allows doctors, scientists, and executives in the Life Sciences industry to better evaluate the likely success or failure of emerging drugs and related treatments. Supporting the platform is Attivio’s unified information access platform, which links unstructured content -- like journal articles, press releases, patents and SEC documents -- with structured, transactional data, such as pricing history, associated with a given new drug.
The company has also built nine ontological libraries specific to various sub-verticals in the life sciences market that essentially allow the platform to draw correlations based on sometimes subtle connections between these varying data assets. Hyde explains, “Ontologies are massively important … If I’m talking about lung cancer, it sounds like one thing. It’s not just one thing. There’s small cell lung cancer, there’s non-small cell, there’s different stages. You would also call certain types of lung cancers solid tumors because it’s a tumor type. So understanding ontologically that those things are connected and being able to relate them across relational databases and document sets into one common entity is really the crucial piece.”
In Relay’s case, for example, these ontological libraries allow end-users to not just understand if a given drug or treatment is likely to be successful from a clinical perspective, but to also take into account the state of patents or FDA approval/disapproval when scoring assets.
What are the cultural challenges associated with Big Data Analytics?
Among other capabilities, Relay Technology Management's platform synthesizes the analytics of structured and unstructured data around a given pharmaceutical asset to assign a score to that asset, which Relay calls its Relative Value Index. The RVI allows end-users to more easily compare the likely success or failure of various assets against one another.
However, scientists and others in the life sciences industry have been performing manual-intensive analytics to aid decision making for years. And, for better or worse, they are reluctant to change methods and embrace a new more data-driven approach.
To increase user adoption, Relay understood it needed to get scientists and others to “trust” its RVI. It does so by providing users drill-down capabilities. In other words, Relay allows users to “trust but verify.”
“Scientists in particular are cynical about data … and trends and algorithmically driven things,” said Hyde. “We give them quant, but they are just one click away from the document that is linked to that quantitative measure.”
The result is that users can gain an understanding about how and why Relay assigned a particular RVI to particular asset until such time that they are confident enough in the system to begin trusting the results when a particular threshold is met.
Another important tactic Relay uses to increase user adoption is to partner with data visualization providers Tableau Software and Tibco SPOTFIRE. Such data visualization tools allow users to view data in the manner that makes the most sense to them and to perform hypothesis-driven analysis.
While other techniques may be used, overcoming end-user reluctance to embrace new, data-driven decision making technologies and processes cannot be overlooked when embarking on Big Data Analytics projects.
The Bottom Line
Deriving business value from Big Data Analytics requires unifying diverse data sources, including unstructured data & content, and deriving intelligence across these previously silo'ed data sources. While analyzing any one data source provides marginal value, unified Big Data Analytics returns exponentially more powerful insights.
CIOs must also continue to improve communication and collaboration between IT and business units to ensure successful Big Data projects. Combining internal and external data sets in a meaningful way that enables insightful analysis requires the input of both data architects/engineers – those who know the data structures – and business analysts and other lines-of-business workers – those with domain knowledge who know which questions to ask the data.
Action Item: Enterprise CIOs and others responsible for business analytics practices can no longer afford to ignore unstructured data & content, or relegate it to second-class data citizenship. Such unstructured data & content must be integrated into business analytics processes at a foundational level, leveraging an ontological or semantic layer that correlates logical connections across multiple data assets regardless of type. CIOs must also be sure that Big Data platforms built today are flexible enough to handle new and emerging data types in the future and provide useful tools that allow end-users to ask and get answers to their important business questions. Only by fully integrating unstructured data & content into business analytics processes and giving users meaningful ways to derive insights will enterprises reap the true value of Big Data Analytics.
Footnotes: