Big Data Solution (MPP) vs. Traditional Data Warehouse Appliance: Financial Comparison

Become a Member!

Why Register?

Login

Featured Research

Announcements

Technology Events

Home Profile Peers Wiki Activity Groups Feedback

Financial Comparison of Big Data MPP Solution and Data Warehouse Appliance

Currently 5/5 Stars.
1
2
3
4
5

rate this

Last Update: Feb 03, 2016 | 10:10

Viewed 97827 times | Community Rating: 5

Originating Author: David Floyer

#memeconnect #emc #Big Data

1 Executive Summary
2 Introduction
3 Traditional Data Warehouse Approach
4 Big Data Approach
5 IT Cost Comparisons
6 Business Benefit Assumptions
7 Conclusions

Executive Summary

Big data is a topic of significant interest to users and vendors at the moment. Wikibon has completed significant research in this area to define big data, to differentiate big data projects from traditional data warehousing projects and to look at the technical requirements. In this paper Wikibon looks at the business case for big data projects and compares them with traditional data warehouse approaches.

The bottom line is that for big data projects, the traditional data warehouse approach is more expensive in IT resources, takes much longer to do, and provides a less attractive return-on-investment. However, big data projects are using new and less mature technologies and carry more risk. As well, big data technologies are unlikely to be suitable for traditional data projects and vice versa – as is so often the case, it is a question of horses for courses.

Comparison of Cumulative Cash Flows of Customer Experience Project using a Big Data Solution (MPP) vs. a Traditional Data Warehouse Appliance
Source: Wikibon 2011

The results of a composite case study are shown in Figure 1, which compares the cumulative cash flows for a project for evaluating customer experience for two different strategies:

A traditional warehouse approach using a best-of-breed data warehouse appliance (Oracle Exadata) for the data warehouse and data analytics (this composite analysis was done after the project was completed).
A big data approach that used CR-X to define the model and data requirements iteratively, an MPP database (Greenplum) to load the data quickly after each iteration, and big data analytic tools (ClickFox and Merced).

The project favored the big data approach because:

The data was distributed through many systems both inside and outside the organization.
The data scheme a simple and “flat”, using event times to inference to establish the customer experience.
The quality and availability of data was unknown at the start and needed many iterations before the right data could be selected and transformed.
The MPP database engine was very fast to load and run as the processing was done where the data was stored.
Very, very large amounts of data needed to be extracted. It was not possible to centralize the data before analysis except by taking a very restricted sampling approach, unsuitable for this particular project.

The financial metrics of the two approaches were overwhelmingly in favor of the Big Data approach:

Big Data Approach:
- Cumulative 3-year Cash Flow - $152M,
- Net Present Value - $138M,
- Internal Rate of Return (IRR) - 524%,
- Breakeven – 4 months.
Traditional DW Appliance Approach:
- Cumulative 3-year Cash Flow - $53M,
- Net Present Value - $46M,
- Internal Rate of Return (IRR) - 74%,
- Breakeven – 26 months.

The conclusion is that for big data projects different IT tools and approaches are needed. When used, these tools can dramatically reduce the time-to-value – in this case from more than two years to less than four months. The result is that many more speculative projects can be run and abandoned if necessary.

Introduction

Wikibon talked to a number of Wikibon members who had traditional data warehouses and some that had initiated big data solutions using MPP architectures. This composite case study compares different analytical solutions to a big data problem.

The core of the problem is to understand the true customer experience. Most organizations have multiple customer touch points, including call operational systems, call centers, Web sites, chat services, retail stores, and partner services. Customer are free to and do use all these touch points. In the case of a mobile phone operator, each can be measured individually, but the measurement systems do not necessarily reflect the overall customer experience, or show the combined effects of all the touch systems.

Traditional Data Warehouse Approach

Many hundreds of systems are distributed throughout the organization and partners. Each system is largely independent, and any customer experience data is concentrated within that system. The traditional data warehouse system approach would have required extensive data definition work with each of the systems and extensive transfer of data from each of the systems. Many of the data sources are incomplete, do not use the same definitions, and not always available. Copying all the data from each system to a centralized location and keeping it updated is unfeasible. Sampling the data would have been very problematic, as the objective was to construct a customer experience view over time from all the events that took place. Sampling by specific customers would have been very difficult. From a traditional data warehouse point-of-view, this would have been a project from hell. The timescale for implementing this project, revising it, and implementing any results was estimated to be at least one year.

Big Data Approach

The alternative big data approach is essentially to iterate to a result. In this case a modeling tool called CR-X was used to define potential relationships to customer experience from the data; data was extracted from the disparate sources using traditional extract tools (newer techniques such as Hadoop may be considered in the future), and loaded into an MPP database (Greenplum). The data schema was fairly simple and “flat”, which was suited to a database architecture where the processing is done where the data resides. This allows much faster data loading and analysis that traditional data warehouse appliances. Specific customer experience analytical packages (ClickFox and Merced) were used to analyze the data as part of the iterative process.

IT Cost Comparisons

The core assumptions for IT costs are shown in Table 1:

IT Cost assumptions for alternative approaches to a Customer Experience Project
Source: Wikibon 2011

Three alternative approaches were analyzed:

A traditional data warehousing approach using a roll-your-own (RYO) approach supplied by a systems integrator (SI). This required 20% less initial IT capital cost that a single SKU solution but was more expensive in support costs as the maintenance of each component had to be done by the customer. The reference model was normalized to an Oracle database. (There were multiple installed alternatives that could have been used.)
The second case used data warehousing appliance provided by the supplier as a single SKU, including all the software. The software was based on Oracle Exadata, and components included a hypervisor, Linux operating system, and database operational middleware. Support from Oracle would have been from a single update to all components simultaneously. This system was not directly assessed by the customer because it was unavailable at the time. However, as the results show in Figure 2 below, it would have been significantly more cost-effective that the RYO alternative.
The third approach considered was a big data solution using an MPP database (Greenplum). The cost of the hardware and software was about 40% of the cost of a traditional SI RYO data warehousing system.

Comparison of 3-year IT Costs for Customer Experience Project using a Big Data Solution (MPP), a Traditional Data Warehouse Appliance RYO solution from a Systems Integrator and an Single SKU Appliance solution
Source: Wikibon 2011, based on data in Table 3 in the footnotes

Figure 2 shows the IT cost results of the three approaches over five years.

The source of this data was the detailed five-year table shown in Table 3 in the footnotes. The big data solution was the least-cost solution for this project and about 40% of the next best single SKU appliance solution.

Business Benefit Assumptions

The core assumptions for the business benefits are shown in Table 2:

Business Benefit Assumptions
Source: Wikibon 2011

Only the best two from the IT cost comparisons were analyzed for business benefits. The project had two phases. The business benefits were considered confidential by the customer and were not discussed in detail. From the information given, the benefits for phase one are conservatively assumed to be $3M /month, rising to $6M/month after the implementation of phase two. The same customer experience benefits were applied to both IT approaches. The key difference was that the big data solution (MPP) could start achieving benefits in three months, whereas the time taken to start accruing benefits with the data appliance was assessed to be 12 months.

Conclusions

The main financial conclusions are shown in Figure 1 in the executive summary. The comparison between the big data approach and the traditional DW appliance approach can be seen by comparing the key financial metrics:

Cumulative 3-year Cash Flow - $152M vs. $53M,
Net Present Value - $138M vs. $46M,
Internal Rate-of-Return (IRR) - 524% vs. 74%,
Break-even – 4 months vs. 26 months.

The project would probably not have been started using the traditional data warehousing techniques, as the IRR of 74% would have been below the hurdle rate for high-risk projects, and the break-even of 26 months too long for the current economic environment.

The main conclusions drawn from this study are:

Appliances are best when they have a single SKU, and are supported by single, tested updates to all the components of the appliance;
Appliances will increasingly become the way that traditional data warehouses are provisioned;
Big data projects require different IT tools and approaches. When used, these tools can dramatically reduce the time-to-value – in this case from more than two years to less than four months;
Big data projects will tend to be more speculative and will need tight management review and a willingness to abandon them when necessary;
Data warehouses will be a significant source of data for big data projects;
Successful big data projects are likely to be folded back into the data warehouse as data extraction capabilities are built into operational systems;
In the era of big data, businesses and suppliers will need to adapt to shorter and more intense projects where the outcome is less certain and the IT resources are much more likely to be provided by service providers.

Action Item: Big Data projects are real and can lead to enormous business benefits in a short period of time. These projects are likely to be led by the business, and IT should separate these projects from the traditional data warehousing groups to ensure that new big data thinking and approaches can be adopted.

Footnotes: Table 4 below shows the five year IT cost analysis of the three approaches, and is the source of IT costs Figues 1 and 2.

5-year Comparison of IT costs for a Customer Experience Project using a Big Data Solution (MPP), a Traditional RYO Data Warehouse Appliance from an SI and a Traditional Single SKU Data Warehouse
Source: Wikibon 2011

Comments on 'Financial Comparison of Big Data MPP Solution and Data Warehouse Appliance'

Impressive. As I read this terrific study, it clearly shows that big data does not replace data warehousing. Rather both have legitimate but different uses and will co-exist in the enterprise. And to an extent they will provide data to each other when appropriate.

If that is correct than the important issue I see is in defining projects carefully to determine whether they are more appropriate for traditional DW or for big data approaches.

Posted By:Bert Latamore| Mon Mar 07, 2011 11:46
From a purely infrastructure standpoint, yes. But there are many more considerations from a business perspective including objectives, monetization strategies, pricing strategies, open source angles, community plays, roadmap, maintainability, skills sets, etc.

Posted By:David Vellante| Mon Mar 07, 2011 12:10
Regarding the cost table (http://wikibon.org/w/images/3/3d/MPPvsDW_Table3.JPG), can you provide the breakdown of $11,469,803 into the following categories:
1) AP
2) Storage
3) Memory
4) Others?

Thank you

Posted By:chuckpiercey| Thu May 16, 2013 11:29

Revision ID	Author	Timestamp	Comment
60577	Wikibon Daemon	16 Feb 03 10:10:49	Protected "[[Financial Comparison of Big Data MPP Solution and Data Warehouse Appliance]]" ([edit=sysop] (indefinite) [move=sysop] (indefinite))
60576	Wikibon Daemon	16 Feb 03 10:07:39
60575	Wikibon Daemon	16 Feb 03 10:06:39	Undo revision 60574 by [[Special:Contributions/Yuswa\|Yuswa]] ([[User talk:Yuswa\|Talk]])
60574	Yuswa	16 Feb 02 17:05:52
60573	Yuswa	16 Feb 02 17:01:13
60526	Bengeorge	15 Dec 22 02:26:23
60524	Bengeorge	15 Dec 22 02:22:00
60523	Bengeorge	15 Dec 22 02:20:40
60501	Sarap	15 Dec 11 07:05:33
60459	Domtheo	15 Nov 02 02:20:54
32880	Wikibon Daemon	11 Mar 18 13:29:28
32870	Wikibon	11 Mar 18 12:17:12
32845	Wikibon Daemon	11 Mar 09 16:58:19
32803	Wikibon Daemon	11 Mar 08 16:12:52
32799	Wikibon Daemon	11 Mar 08 14:12:58
32795	Dvellante	11 Mar 07 22:36:09
32794	Dvellante	11 Mar 07 22:34:35
32793	Dvellante	11 Mar 07 22:31:36
32792	Bert Latamore	11 Mar 07 11:43:23
32791	Wikibon Daemon	11 Mar 07 11:42:29
32789	David Floyer	11 Mar 07 10:51:18	/* Conclusions */
32788	David Floyer	11 Mar 07 10:47:01	/* IT Cost Comparisons */
32787	David Floyer	11 Mar 07 10:44:16	/* Big Data Approach */
32786	David Floyer	11 Mar 07 10:41:53	/* Traditional Data Warehouse Approach */
32785	David Floyer	11 Mar 07 10:39:19	/* Conclusions */
32784	David Floyer	11 Mar 07 10:32:22	/* Business Benefit Assumptions */
32783	David Floyer	11 Mar 07 10:20:38
32782	David Floyer	11 Mar 07 10:17:46
32781	David Floyer	11 Mar 07 10:02:22
32780	David Floyer	11 Mar 07 10:00:39
32778	David Floyer	11 Mar 07 09:55:17
32773	David Floyer	11 Mar 07 09:42:07
32756	David Floyer	11 Mar 04 12:59:53
32755	David Floyer	11 Mar 04 12:57:39	Created page with '===Introduction=== Wikibon talked to a number of Wikibon members who had traditional data warehouses and some that had initiated Big Data solutions using MPP archite...'

Wikibon is a professional community solving technology and business problems through an open source sharing of free advisory knowledge.

Become a Member!

Login

Featured Research

Announcements

Technology Events

Contents

Executive Summary

Introduction

Traditional Data Warehouse Approach

Big Data Approach

IT Cost Comparisons

Business Benefit Assumptions

Conclusions

Comments on 'Financial Comparison of Big Data MPP Solution and Data Warehouse Appliance'

Post A Comment

most recent wikibon articles

latest wikibon blog posts

company profiles

wikibon community information