The latest iteration of Informatica’s core data integration platform is about much more than just traditional ETL jobs. In fact, all the major enhancements and new features in Informatica 9.5 are focused on making it simpler and faster for enterprises to realize business value from big data. Functionally, while Informatica continues to refer to itself as “the data integration company,” 9.5 represents another step in the platform's ongoing evolution from a data integration point solution to a comprehensive data management suite.
Informatica 9.5 was motivated in part by the need to harness the power of three separate but related trends, according to Judy Ko, Vice President, Solutions and Platform Product Marketing, at Informatica. They are big transaction data, big interaction data, and big data processing.
Support for HANA, Exadata and Transactional Big Data systems
For Informatica, big transaction data refers to large volumes of mostly structured, transactional data that can now be stored and manipulated with the new breed of databases and data warehouses represented by Oracle Exadata and SAP HANA. Version 9.5 adds new data integration capabilities for these databases as well as support for a wider variety of data formats and standards found in the financial services, telecommunications, healthcare, and government sectors.
Other enhancements and new functionality in 9.5 for big transaction data include:
- Embeddable cloud services for cloud ISVs. These include dynamic templates to allow developers to easily integrate data in multiple formats to build cloud-based applications. The cloud services are aimed largely at the OEM market as part of Informatica’s “Informatica Inside” strategy, similar to what Intel accomplished with its processors business.
- Scalable data profiling capabilities. Informatica has long been in the data profiling business, but until now its capabilities in this area were limited to 1-to-1 activities. Version 9.5 introduces the ability to perform data profiling – such as identifying discreet business identities that should be targeted for encryption – at scale. Users can now, for example, profile data across “hundreds of tables” simultaneously, said Ko.
- Data workflow and data integrity functionality. The platform now includes holistic data workflow capabilities, allowing users to apply workflow rules to data as it passes through its lifecycle to ensure relevant rules and regulations are followed. It also adds incremental data validation capabilities to ensure the correct data is being loaded into production environments at the time of data movement.
Enabling Social MDM and Harnessing Machine-Generated Data
The next area of focus for Informatica 9.5 is big interaction data, by which the company means the large volumes of data being created by social media/networking and machine- or sensor-generated data.
Version 9.5 adds capabilities designed to support what Informatica is calling social master data management. Namely, the platform can now pull data from Facebook and other social networks into customer profiles maintained in CRM systems and MDM hubs. Natural language processing is used to identify discreet entities - such as customer names, job titles, company names, and professional skills – while matching algorithms automate the process of connecting the data with particular customer accounts. The process is therefore largely automated, but results can be set for human review for validation if desired. Importantly, the new social integration capabilities are only kicked off if the customer agrees to share his or her data with Informatica and is terminated if authorization is revoked, according to Ko.
For machine-generated data, Informatica 9.5 can quickly add structure to binary data and other multi-structured data created by sensors, RFID chips and other devices for use in relational systems. This is particularly important for the utilities industry, which is now being bombarded with large volumes of usage data thanks to the advent of smart meters. This capability is due in part to HParser, Informatica’s Big Data parsing tool, which the company released last year and has now integrated into the core platform.
Helping Hadoop “Grow Up”
Informatica recognizes that most organizations working with Hadoop today are doing so in proof-of-concept projects or other experimental work. The next step is to move Hadoop PoCs into large-scale production, but doing so requires data management and processing capabilities that can scale securely and efficiently. By "big data processing", therefore, Informatica is referring to new capabilities in version 9.5 aimed to, as Ko puts it, “help Hadoop grow up.”
Among these features are new visual development capabilities. Informatica is used widely across industries to integrate traditional structured data into data warehouses and other analytic environments. This has resulted in a generation of data integration professionals that have grown up on, and are now dependent on, Informatica’s visual development capabilities for building and orchestrating integration jobs. Informatica 9.5 adds such capabilities, including a drag-and- drop graphical user interface, to Hadoop environments. Informatica-focused developers can now integrate and manipulate Hadoop-based data without needing to know how to write MapReduce jobs or otherwise hand-code. There are also new Hadoop management capabilities, making it simpler for administrators to archive cold data and take other steps to maximize Hadoop performance and minimize storage costs.
Most interestingly, Informatica 9.5 includes new data streaming capabilities aimed at allowing Hadoop to support real-time data analysis. This is one of Hadoop’s major shortcomings. Hadoop is at its core a batch-and-load system. Administrators dump large chunks of data into Hadoop and wait for the data to be processed by the system. That data is therefore not available for analysis by data scientists and others until the processing has finished, which can sometimes take hours. With Informatica’s new streaming capabilities, analysts theoretically now have access to new data as it is created and loaded into Hadoop, allowing them to perform analysis and build applications that much quicker.
Data Integration in the Era of Big Data
Clearly Informatica is no longer simply a data integration company in the traditional sense. The company has rightly recognized the new demands brought to bear by big data, which include ETL-style data integration but also easy-to-use tools, data quality, streaming data loading, data parsing and data governance at scale. As with all new platform versions, time and customer experience will tell how effective Informatica 9.5 truly is for tackling big data jobs, but from a functionality standpoint the company has covered a large swath of the big data integration and management capabilities currently lacking in the largely open source ecosystem.
The risk for Informatica, as with all commercial vendors attempting to move into the big data space, is that it is seen as an interloper in an ecosystem currently dominated by open source companies like Cloudera, Pentaho, and Hortonworks, and independent-minded developers with a near-fanatical devotion to open code. To overcome this potential hostility and to increase goodwill, Informatica should increase its level of community activity in the form of providing more thought leadership and educational resources in lieu of actually open sourcing its code-base, a step the vendor is highly unlikely to take.
Another danger is that the introduction of a proprietary, commercial data management platform into largely open source environments will upset the entire Hadoop value proposition, which is based on being able to inexpensively scale out deployments thanks to cheap commodity hardware. That said, the lack of easy-to-use data management and data processing capabilities as applied to big data is one of the major obstacles preventing more experimental Hadoop projects from transitioning to full-scale production deployments. Since applying big data across the enterprise to support mission-critical applications and business processes is where Hadoop’s real value lies, some enterprises may determine that paying for a proprietary platform, such as Informatica 9.5, to make the leap to Hadoop production deployments is worth the cost.
Action Item: Enterprises that are currently experimenting with big data but are struggling to chart a path to production deployments should closely evaluate Informatica 9.5 for this purpose. This is especially true of enterprises with teams of data integration pros already steeped in the Informatica platform. Whether to make the investment in 9.5 or not will ultimately depend on the speed at which return-on-investment is achieved. Where there are clear and valuable big data use cases that have the potential to deliver near-immediate returns, the cost of investing in Informatica 9.5 may definitely be justified. Further, the platform, in conjunction with Hadoop, should support multiple big data use cases in the future as projects develop, so users should consider the value of Informatica 9.5 as it is applied and reapplied over time. Where production use cases are still lacking, however, enterprises should hold off on investing in proprietary big data platforms and tools until such scenarios are identified. Continue experimenting with open source big data tools and approaches, with an eye toward platforms such as Informatica 9.5 at such time the value calculations of such investments make sense.
Footnotes: