Look at the data management architecture and technology portfolio of any large enterprise and you will more than likely find a heterogeneous collection of databases, data warehouses, data integration tools, and business intelligence applications from multiple vendors. Over the years, most large enterprises have spent multiple-millions of dollars procuring, deploying, and operating these data management technologies, which today support many mission-critical business processes and revenue-generating applications. While many of these technologies are growing long in the tooth and cost enterprise customers millions of dollars a year in maintenance and support, they none-the-less form the backbone of many enterprise operations and are not going away any time soon.
It is against this backdrop that Hadoop makes its appearance. The open source Big Data platform began life as a set of related data storage and processing approaches developed by Web giants like Google and Yahoo to solve specific tasks (first among them, indexing the World Wide Web.) But Hadoop quickly evolved into a general-purpose framework supporting multiple analytic use-cases. A number of forward-thinking enterprises took notice, as, simultaneously, the ever-increasing volume, variety, and velocity of data (a.k.a. Big Data) raining down on the enterprise began to overwhelm the traditional data management stack. According to feedback from the Wikibon community, many data management practitioners today are eager to leverage Hadoop to relieve some of this pressure on existing data management infrastructure and support new, differentiating analytic capabilities.
Many enterprise early adopters have and continue to build out their Hadoop capabilities without engaging commercial technology providers. Specifically, just 25% of current Hadoop practitioners are paying customers of one or another of the several Hadoop distribution vendors currently on the market. However, as these practitioners transition from Hadoop pilot projects to full-scale production deployments, most will turn to commercial vendors for related software and support services. Feedback from the Wikibon community backs up this assertion. This trend will only increase as the Hadoop adoption curve hits the early mainstream and beyond, as most enterprises lack the internal skills and expertise required to support production-grade Hadoop deployments.
Hadoop Integration Requires Tight Partnerships
There are a number of criteria against which enterprise data management practitioners should evaluate commercial Hadoop distribution vendors. One of the more important of these criteria is partnership strategy. Partnerships are important because Hadoop does not exist in a vacuum. Wikibon believes Hadoop is a foundational technology in the modern data architecture due to its ability to support analytics on data of any structure and to scale out efficiently and affordably, but it must integrate tightly with existing data management technologies in order to deliver maximum value and avoid becoming yet another data silo in the enterprise.
For example, many practitioners use Hadoop as a "data lake" where data transformations take place and data scientists perform exploratory analytics against large volumes of multi-structured data (which is difficult, if not impossible, to do using traditional technologies and approaches.) However, data lakes need to be filled. This requires data integration and data movement tools to feed data from source systems (both internal and external to the enterprise) into Hadoop. Then, some transformation and analytic functions need to be pushed down into Hadoop to leverage its computational power in addition to its low-cost storage capabilities. Resulting insights and models often need to be exported to existing enterprise data warehouse (EDW) and departmental data mart environments for further analysis and reporting as well as to operational database environments, including existing relational databases and emerging NoSQL data stores. And all this must be orchestrated intelligently via middleware to ensure environments are operating efficiently and that data gets to where it needs to be when it needs to be there.
In short, while powerful and full of potential, the modern data architecture can be complex. Effective partnerships between Hadoop distribution vendors and the other players in the data management stack make it less so. The tighter the integration between Hadoop and related data management technologies, the shorter the time-to-value for enterprise Big Data practitioners. Hadoop requires a vibrant and active ecosystem of vendors cooperating with one another to enable enterprise practitioners to smoothly and effectively integrate it into existing environments. This also provides practitioners with choice, allowing them to select the best tools and technologies for their particular circumstances while reducing the risk of vendor lock-in.
Hadoop Partnership Evaluation Criteria
Wikibon recommends enterprise data management practitioners evaluate the partnership strategies of the various Hadoop distribution vendors based on following:
1. State of Partnerships with Strategic Technology Suppliers: Enterprise practitioners should evaluate Hadoop vendors based in part on whether they have existing partnerships with their own strategic data management technology providers. This is particularly important as it relates to data warehouse vendors.
While Wikibon believes Hadoop capabilities will increasingly converge with that of the EDW, the reality today and for the next two-to-five years is that the EDW will remain the superior technology for a number of strategic analytic and reporting capabilities. The EDW is also the most common point-of-contact (via business intelligence tools) relative to corporate data assets for data analysts and business users alike. For these reasons, it is critical that existing EDWs provide a gateway to Hadoop for these users in the form of optimized query tools. Such tools abstract away the underlying data storage and processing complexity inherent in Hadoop/MapReduce and allow end-users to interact with Hadoop-based data as if it were natively stored in the EDW via an SQL-like interface.
In addition, a common initial Hadoop use case is data warehouse offloading, in which appropriate workloads currently performed in an EDW are shifted to Hadoop in order to take advantage of the latter’s lower costs and superior scalability. This includes the additional benefit of making the resulting data now available for further integration and analysis with unstructured/multi-structured data already stored in Hadoop. The tighter the integration between the two (in the form of fully parallel data connectors but also integrated workload management tools) the faster and easier for practitioners to identify and move workloads between an EDW and Hadoop.
Enterprise practitioners should ensure any Hadoop vendor under consideration not only has existing partnerships with the practitioner's EDW technology provider but should also press Hadoop vendors on these two points of integration in particular. Beyond data warehouse vendors, enterprise practitioners should ensure Hadoop vendors are partnered with other strategic technology providers related to data integration, operational databases, and security/governance/management software.
2. Depth of Partnerships: The mere existence of partnerships between Hadoop vendors and data management technology providers is just step one. As suggested above, depth of partnerships is also important. With all the hype surrounding Big Data, technology vendors of all types (both those whose products are based on or designed specifically for Hadoop as well as “traditional” data management technology providers) scrambled to form partnerships with Hadoop vendors to ride the Big Data wave.
The partnership frenzy continues today. But not all partnerships are created equal, and enterprise practitioners must do their due diligence to identify those partnerships with real substance versus those that are “Barney” partnerships (I love you, you love me.)
The first substantive step in the partnership relationship is certification, which most of the Hadoop vendors offer. Certification simply means the partner technology has been tested against the Hadoop vendor’s platform and meets minimum requirements for interoperability. Certifications are table stakes when evaluating Hadoop partnerships.
A more important marker of depth-of-partnership than certification is the level of joint engineering between the Hadoop vendor and partner. Specifically, enterprise practitioners should evaluate Hadoop vendors and their strategic data management technology providers based on whether the two have co-developed joint product integration/go-to-market roadmaps and, if so, the level of progress made against said roadmap.
A comprehensive roadmap should include both tactical joint-engineering plans and high-level strategic vision. It should illustrate how traditional data management vendors see themselves evolving in a world increasingly dominated by Big Data approaches, including specific steps the two vendors are taking to turn vision into reality. From the Hadoop vendor’s perspective, roadmaps should explain how it plans to take advantage of the mature enterprise-grade features around data governance, security, and reliability inherent in traditional data management technologies.
Another important area of integration to consider is support services. A number of data management technology providers have established reseller arrangements with one or another of the Hadoop distribution vendors. Most of these reseller agreements include support service offerings. In most cases, the reseller provides level one support for Hadoop-related issues, with the Hadoop vendor taking level two cases and above. Enterprise practitioners should ensure that support service details are clearly defined in any engagement and should question both results and Hadoop vendors on the mechanisms for engaging support services as issues escalate in importance.
3. Length of Partnerships: While length of partnership alone does not speak to its substance, integrating traditional data management technologies, some of which are decades old, with the new data storage, processing, and analytics approaches that are core to Hadoop, takes time. It requires vendors to conduct significant joint engineering and development efforts, with many starts and stops along the way. This is all the more likely in scenarios in which one of the technologies in question is supported by an active and vocal open source community, as is Hadoop. This requires additional coordination with the open source community and standards bodies, in this case the Apache Software Foundation. So while length of partnership is not directly correlated to depth of partnership, the former is a contributing factor to achieving the latter.
4. Platform Vision Alignment: It is important to ascertain the true long-term platform vision, relative to the traditional data management stack, of the various Hadoop vendors currently on the market. All the Hadoop vendors say they are committed to deep and lasting partnerships with existing data management technology vendors, but do their long-term product visions align with this? Does the Hadoop vendor have ambitions to move higher up the stack into areas that will eventually conflict with what are today strategic partners in the database, analytics, or application space?
A yes answer to these questions isn’t necessarily a bad thing, but it is important for enterprise practitioners to understand their Hadoop vendor-of-choice’s long-term platform vision and ensure it aligns with their enterprise’s strategy for building its version of the modern data architecture. For enterprise practitioners looking for a Hadoop vendor that can increasingly provide more functionality up the stack, potentially displacing legacy approaches and vendor products, a yes answer might be a good fit. Enterprise practitioners that have invested heavily in existing data management technologies and are committed to integrating them into their emerging modern data architecture where appropriate may find a Hadoop vendor that is focused exclusively on Hadoop core to be a better fit. There is no right or wrong answer, but it is important for enterprise practitioners to align with a Hadoop vendor whose long-term product vision aligns with their approach to leveraging Big Data.
5. Maturity of Partnership Mechanisms: Hadoop vendors must have the internal infrastructure and mechanisms needed to nurture the development of deep partnerships. Enterprise practitioners should press Hadoop vendors on these details. First, does the Hadoop vendor have the engineering environments and administrative processes set up to on-board new partners quickly and effectively? Second, and possibly more important, does it have the mechanisms in place to get joint-engineering efforts into the Apache Hadoop pipeline and the influence needed to shepherd these joint efforts through the pipeline and into the Apache Hadoop core?
These are important questions to answer because such infrastructure and partnership mechanisms provide traditional data management technology providers a say in the direction of Apache Hadoop. This in turn has a direct impact on enterprise practitioners and their efforts, in both the short and long-term, to integrate Hadoop with existing data management tools and technologies.
Action Item: Hadoop is a foundational technology in the emerging modern data architecture and is the font of most of the innovation taking place in the Big Data ecosystem. It is and will continue to play a critical role for enterprises across vertical industries as they replace gut-driven decision-making and slow, manual business process with data-driven decision-making and real-time, intelligent transaction execution. But Hadoop does not exist in a vacuum, and it must integrate with myriad existing data management technologies in the enterprise. It is critical, therefore, that enterprise practitioners select a Hadoop vendor that has the partnerships in place to make this integration happen as smoothly as possible in order to maximize the value of both Hadoop and existing systems. Wikibon advises enterprise practitioners to evaluate Hadoop vendors based on the above criteria in order to select the best fit for both their short-term goals and long-term Big Data vision.
Footnotes: