HP announced a large number of products and product updates at HP Discover 2012 in Frankfurt. This research focuses on the storage elements of the announcement through the lens of their potential to contribute to a Software-Led Infrastructure. Wikibon picks out StoreAll as the most strategic and innovative announcement, and a potential game-changer in management of unstructured and semi-structured Big Data.
Wikibon has just published research on Software-Led Infrastructure and one component of the infrastructure, Software-Led Storage. The fundamental premise is that data centers will migrate from the current model where all storage services are provided from software within the specific "box" component (e.g., a storage array) to an infrastructure where the system, storage, and network services are provided as software across the data center as a whole, and the software within each component is much reduced.
HP 3PAR Announcements & Directions
The 3PAR storage array family was increased by the addition of the StoreServ 7000 range. The 3PAR StoreServ 7400 is a two- or four-controller array with all the storage service features of the larger 3PAR arrays. The 7200 is a two-controller array with less redundancy. The entry price of the 7000 is about $40,000.
One of the premises of software-led storage is that by 2015 a strong majority of new arrays for active data will be flash-only. HP reflects that vision with the introduction of SSD-only 7000-range arrays. This is a first step, but HP will have to provide different housing and modules for flash components to compete with flash-only offerings from the many flash-only vendors (e.g., Nimbus, SolidFire). Wikibon expects that HP will take the same direction as Hitachi, and provide a separate flash module for 3PAR arrays, using the same controllers and 3PAR storage services.
One of the interesting "quiet" additions to OnServ, the HP 3PAR Array Operating System Software, was to include a Web Services Toolkit and Representational State Transfer (REST) API to allow the integration with Software-Led Infrastructure automation tools using a standard interface. This is initially particularly important for service providers and larger data centers. The scope of this API is not yet clear - to be an effective player of software-led storage, HP will need to comfortable with the whole array being managed from outside of 3PAR, and all maintenance and storage software services being available through this API.
HP StoreOnce Announcements & Directions
HP added new StoreOnce 2000 and 4000 Backup models. These additions extends HP StoreOnce de-duplication backup appliances to small and midsize businesses. Earlier this year, HP introduced HP StoreOnce Catalyst software, which enables data to be de-duplicated on application servers or backup servers before it is transferred to a centralized HP StoreOnce Backup system. This approach makes the de-duplication service available across a Software-Led Infrastructure, and potentially allows end-to-end de-duplication without having to rehydrate and re-duplicate data.
HP StoreAll Announcement & its Metadata Potential for Big Data
The most interesting announcement from a Software-Led Storage perspective is the availability of StoreAll. This is a repository for big data files and object data, ideal for the the passive data retention storage layer shown in the middle column of Software-Led Storage illustration in Figure 1. The StoreAll platform emphasizes scale (growth to over 1,000 nodes and 16 petabytes of data) and low cost of storage (a claimed cost of $0.9/GB).
Access to the data is though a simple "Put & Get" REST API. One area HP is targeting is the ‘Sync/Share’ ecosystem of ISVs such as Oxygen and Ctera, to allow an internal and more secure enterprise "DropBox" enterprise environment (good luck in getting compliance with dropping DropBox!). HP StoreAll is also certified and integrated with a number of ISV applications, including CommVault Simpana, iTernity iCAS, Symantec Enterprise Vault, Agfa HealthCare IDC, GE Healthcare EA and STS, McKesson HPF, and Genetec Omnicast. And of course, HP's Autonomy (more on that later).
However, storing Big Data on disk without knowing what data is where is tantamount to sending data to die. Performing searches across petabytes of data, even with advanced Hadoop technology, is expensive and extremely time and elapsed-time consuming. The keys to the potential value of the StoreAll architecture in Big Data are the metadata services. This is shown in the third column of Figure 1.
StoreAll allows metadata extensions to provide user-defined metadata. One use of the metadata is to define retention policies and ensure provenance with features such as WORM. The second and strategically more important use is to enable rapid data extraction from the repository.
The HP technology that provides this data extraction is Express Query created by HP research Labs. This allows the location of files and objects to be available potentially in real time across petabytes of data. Express Query can be used as a standalone query program or in conjunction with Big Data retention and analytic applications.
One proof point for the potential of StoreAll is the integration with HP Autonomy. A specific connector integrates Express Query with HP Autonomy IDOL to streamline the processing of dynamic content across large data sets. HP StoreAll Express Query's accelerated file namespace scan delivers inline updates to IDOL-based applications, making it possible to process new data changes rapidly.
HP StoreAll is integrated with Autonomy's meaning-based Consolidated Archive. This enables easier identification of potential business value within the data and streamlines processing for business processes like eDiscovery and regulatory compliance.
In tests, HP showed it reduced the time to load data into Autonomy by a factor of 40 times. This is a potentially massive improvement in reducing the elapsed time for extraction, transformation, and load (ETL) of data.
HP has indicated a commitment to Software-Defined or Software-Led direction. The key to this is to move from forcing as much software as possible to be consumed at the box level to a "play nice" strategy, where services and boxes can interact through stable APIs. One of the underlying technologies that has to be in place is a clear real-time system metadata strategy that can be updated in real-time, and can span active and passive data. The potential benefits of such an approach are shown in the 40x load times for Autonomy, and the potential for designing analytics that modify the behavior of operational transaction systems in real-time.