#memeconnect #emc #Big Data
Big data is a topic of significant interest to users and vendors at the moment. Wikibon has completed significant research in this area to define big data, to differentiate big data projects from traditional data warehousing projects and to look at the technical requirements. In this paper Wikibon looks at the business case for big data projects and compares them with traditional data warehouse approaches.
The bottom line is that for big data projects, the traditional data warehouse approach is more expensive in IT resources, takes much longer to do, and provides a less attractive return-on-investment. However, big data projects are using new and less mature technologies and carry more risk. As well, big data technologies are unlikely to be suitable for traditional data projects and vice versa – as is so often the case, it is a question of horses for courses.
The results of a composite case study are shown in Figure 1, which compares the cumulative cash flows for a project for evaluating customer experience for two different strategies:
- A traditional warehouse approach using a best-of-breed data warehouse appliance (Oracle Exadata) for the data warehouse and data analytics (this composite analysis was done after the project was completed).
- A big data approach that used CR-X to define the model and data requirements iteratively, an MPP database (Greenplum) to load the data quickly after each iteration, and big data analytic tools (ClickFox and Merced).
The project favored the big data approach because:
- The data was distributed through many systems both inside and outside the organization.
- The data scheme a simple and “flat”, using event times to inference to establish the customer experience.
- The quality and availability of data was unknown at the start and needed many iterations before the right data could be selected and transformed.
- The MPP database engine was very fast to load and run as the processing was done where the data was stored.
- Very, very large amounts of data needed to be extracted. It was not possible to centralize the data before analysis except by taking a very restricted sampling approach, unsuitable for this particular project.
The financial metrics of the two approaches were overwhelmingly in favor of the Big Data approach:
- Big Data Approach:
- Cumulative 3-year Cash Flow - $152M,
- Net Present Value - $138M,
- Internal Rate of Return (IRR) - 524%,
- Breakeven – 4 months.
- Traditional DW Appliance Approach:
- Cumulative 3-year Cash Flow - $53M,
- Net Present Value - $46M,
- Internal Rate of Return (IRR) - 74%,
- Breakeven – 26 months.
The conclusion is that for big data projects different IT tools and approaches are needed. When used, these tools can dramatically reduce the time-to-value – in this case from more than two years to less than four months. The result is that many more speculative projects can be run and abandoned if necessary.
Wikibon talked to a number of Wikibon members who had traditional data warehouses and some that had initiated big data solutions using MPP architectures. This composite case study compares different analytical solutions to a big data problem.
The core of the problem is to understand the true customer experience. Most organizations have multiple customer touch points, including call operational systems, call centers, Web sites, chat services, retail stores, and partner services. Customer are free to and do use all these touch points. In the case of a mobile phone operator, each can be measured individually, but the measurement systems do not necessarily reflect the overall customer experience, or show the combined effects of all the touch systems.
Traditional Data Warehouse Approach
Many hundreds of systems are distributed throughout the organization and partners. Each system is largely independent, and any customer experience data is concentrated within that system. The traditional data warehouse system approach would have required extensive data definition work with each of the systems and extensive transfer of data from each of the systems. Many of the data sources are incomplete, do not use the same definitions, and not always available. Copying all the data from each system to a centralized location and keeping it updated is unfeasible. Sampling the data would have been very problematic, as the objective was to construct a customer experience view over time from all the events that took place. Sampling by specific customers would have been very difficult. From a traditional data warehouse point-of-view, this would have been a project from hell. The timescale for implementing this project, revising it, and implementing any results was estimated to be at least one year.
Big Data Approach
The alternative big data approach is essentially to iterate to a result. In this case a modeling tool called CR-X was used to define potential relationships to customer experience from the data; data was extracted from the disparate sources using traditional extract tools (newer techniques such as Hadoop may be considered in the future), and loaded into an MPP database (Greenplum). The data schema was fairly simple and “flat”, which was suited to a database architecture where the processing is done where the data resides. This allows much faster data loading and analysis that traditional data warehouse appliances. Specific customer experience analytical packages (ClickFox and Merced) were used to analyze the data as part of the iterative process.
IT Cost Comparisons
The core assumptions for IT costs are shown in Table 1:
Three alternative approaches were analyzed:
- A traditional data warehousing approach using a roll-your-own (RYO) approach supplied by a systems integrator (SI). This required 20% less initial IT capital cost that a single SKU solution but was more expensive in support costs as the maintenance of each component had to be done by the customer. The reference model was normalized to an Oracle database. (There were multiple installed alternatives that could have been used.)
- The second case used data warehousing appliance provided by the supplier as a single SKU, including all the software. The software was based on Oracle Exadata, and components included a hypervisor, Linux operating system, and database operational middleware. Support from Oracle would have been from a single update to all components simultaneously. This system was not directly assessed by the customer because it was unavailable at the time. However, as the results show in Figure 2 below, it would have been significantly more cost-effective that the RYO alternative.
- The third approach considered was a big data solution using an MPP database (Greenplum). The cost of the hardware and software was about 40% of the cost of a traditional SI RYO data warehousing system.
Figure 2 shows the IT cost results of the three approaches over five years.
The source of this data was the detailed five-year table shown in Table 3 in the footnotes. The big data solution was the least-cost solution for this project and about 40% of the next best single SKU appliance solution.
Business Benefit Assumptions
The core assumptions for the business benefits are shown in Table 2:
Only the best two from the IT cost comparisons were analyzed for business benefits. The project had two phases. The business benefits were considered confidential by the customer and were not discussed in detail. From the information given, the benefits for phase one are conservatively assumed to be $3M /month, rising to $6M/month after the implementation of phase two. The same customer experience benefits were applied to both IT approaches. The key difference was that the big data solution (MPP) could start achieving benefits in three months, whereas the time taken to start accruing benefits with the data appliance was assessed to be 12 months.
The main financial conclusions are shown in Figure 1 in the executive summary. The comparison between the big data approach and the traditional DW appliance approach can be seen by comparing the key financial metrics:
- Cumulative 3-year Cash Flow - $152M vs. $53M,
- Net Present Value - $138M vs. $46M,
- Internal Rate-of-Return (IRR) - 524% vs. 74%,
- Break-even – 4 months vs. 26 months.
The project would probably not have been started using the traditional data warehousing techniques, as the IRR of 74% would have been below the hurdle rate for high-risk projects, and the break-even of 26 months too long for the current economic environment.
The main conclusions drawn from this study are:
- Appliances are best when they have a single SKU, and are supported by single, tested updates to all the components of the appliance;
- Appliances will increasingly become the way that traditional data warehouses are provisioned;
- Big data projects require different IT tools and approaches. When used, these tools can dramatically reduce the time-to-value – in this case from more than two years to less than four months;
- Big data projects will tend to be more speculative and will need tight management review and a willingness to abandon them when necessary;
- Data warehouses will be a significant source of data for big data projects;
- Successful big data projects are likely to be folded back into the data warehouse as data extraction capabilities are built into operational systems;
- In the era of big data, businesses and suppliers will need to adapt to shorter and more intense projects where the outcome is less certain and the IT resources are much more likely to be provided by service providers.
Action Item: Big Data projects are real and can lead to enormous business benefits in a short period of time. These projects are likely to be led by the business, and IT should separate these projects from the traditional data warehousing groups to ensure that new big data thinking and approaches can be adopted.
Footnotes: Table 4 below shows the five year IT cost analysis of the three approaches, and is the source of IT costs Figues 1 and 2.