Imagine being able to predict the future. To foresee market trends. To identify the wants and needs of people before their cravings hit. What if a company could explain correlations between business actions and unrelated spikes to sales in different sectors? One could confidently predetermine a path to success by providing answers to questions that haven’t arisen yet. What if the information to determine these predictions was already available and being constantly updated? If one could harness that information, asking new and innovative questions of it, do you think the world would change? Count on it. The watchword is Big Data, and it has the power to revolutionize the way we think about using information.
What is Big Data?
Big Data, at its core, refers to data that is too massive in size, too fast in its creation or too diverse in its structure to process, store and analyze with traditional databases and tools. New technologies have emerged with the ability to analyze these unimaginably large and varied data sets at speeds measured in minutes and hours, not days, weeks or even years. They provide answers to questions by processing data sets that previously were impractical to analyze. This powerhouse of information can pull from virtually any digital data ever recorded. Every ‘like’ on Facebook, value card swiped, advertisement clicked, at what time, by whom, using which device, their recent purchases and any other source of semi-structured or unstructured data ever recorded. Big Data nullifies the concepts of sample group, marketing research surveys, and margin of error. It has the potential go beyond noting a spike in sales at a certain point, but can explain precisely why it spiked by reporting the cyber trail of information that created the demand within a reasonable timeframe for action.
What’s New? Isn’t that Data Warehousing?
Traditional data warehousing is complex but organized, categorized, and interconnected. It is generally internal information exclusive to a single business and built with a preconceived idea in mind. Data Warehousing uses approximately 90% historical data and 10% new data as new data takes time to format into such a high structure. Think of it as the overall big-picture of obvious relationships for a business. Data Warehousing is now referred to as the “traditional method.”
Big Data is the way to fill in the details or discover the connections that make the picture complete. Big Data is able to make sense of the mountain of loosely structured, uncategorized information that have few complex interrelationships that was previously viewed as an inaccessible trash-pile of data. Additionally, Big Data is capable of analyzing multiple sources of data from different sectors effectively pulling out correlations that would have been impossible to find using traditional Data Warehousing.
How is it analyzed?
Big Data isn’t something that can be analyzed using traditional methods. There is simply too much information to be organized and structured to make any sense of it. Analyzing and structuring data had to be thought of in a new way. One new way is called Hadoop. Doug Cutting created Hadoop while working at Yahoo!. It’s an Apache project written in Java, and now it’s being built and developed by multiple contributors elsewhere. Cutting left Yahoo! to work for Cloudera, a company that develops and distributes Hadoop to make it accessible to Enterprise IT.
Hadoop is made up of several components that all work together to store and analyze data. Hadoop Distributed File System (HDFS) and MapReduce work together to analyze data. This arrangement goes to multiple data sources to be analyzed simultaneously and separately before being combined and reanalyzed relevant to a question to be answered. Traditional methods gather query results into one place to use a single analyze source. Hadoop chews the information first before digesting it instead of attempting to digest a mountain all at once.
Furthermore, different programmers are developing components of Hadoop to simplify the user process. For example, Hive is a Hadoop-based data warehouse developed by Facebook. It makes it easier for someone who is not familiar with the specialized programming skills required to run Hadoop to use it anyway. Hadoop is not the only program out there to analyze Big Data, Splunk and HPCC Systems are other systems in development.
Other Big Data technologies and approaches besides Hadoop are also being developed. They include massively parallel processing data warehousing, streaming Big Data analytics engines, and a myriad of NoSQL data stores.
What’s the point?
The Big Data is currently growing at a rapid clip. It moves beyond the traditional methods of data analytics to improve time-to-value. Early analytic projects using Big Data showed that businesses were able to speed up the time-to-value cycle from 12 months to 3 months. The break even point was at 4 months instead of the typical 24 months. This means the business was able to benefit from the results of the analysis faster than with traditional data. Being able to analyze large amounts of data at once puts us one step closer to analyzing data in real time.
Here’s some food for thought. Knowing massive amounts of data are now being crunched at faster and faster paces, what does this mean for the landscape of business? Where can someone go to gather the massive amounts of relevant and timely data required to predict market outcomes? Who has 845 million active users who willingly supply their every whim and desire on a daily basis? Ever wonder why Facebook’s IPO is valued at $100 billion?