One of the greatest operational challenges in modern data centers is copy management. The number of copies of data is proliferating. Part of the reason is the availability on all storage array platforms of tools to make space-efficient snapshots. The space efficient snapshots were first introduced by NetApp and are logical copies of data based on metadata held in WAFL (Write Anywhere File Layout). By taking just the delta changes between two snapshots, these enable much more efficient replication of data either locally or to remote sites. Another reason for so many copies is that while current disk drives have increased radically in density (with 4TB drives becoming the norm), the access density (the number of IOs and the amount of data that can be extracted from this drives in a unit of time) has remained the same or declined. To ensure copies of data can actually be used, physical copies of data have to be made. The average number of copies of data exceeds 10 in a even a well run data center.
The cost implications of these data copies are great. The management challenges of managing all these copies are even greater. Finding snapshots uses the same principles as paper files - the newest one is on top, with the least amount of dust. Keeping track of snapshots, when they were taken, which developers or end-users have used them, whether they have been deleted and the provenance of data snapshots used in downstream processing and data warehousing, is extremely suspect in most organizations. This leaves security and compliance less than adequate.
The solution put forward by catalog software vendors is to keep track of all copies of data made and the usage made to every copy of data. The resultant metadata about the copies can be used to:
- Decrease the cost of storage and storage management;
- Reduce the amount of data that has to be backed up and the backup management costs;
- Improve the availability of copies to developers;
- Reduce the time-to-value for new applications and updates to applications.
Wikibon investigated the catalog software to quantify the benefits in one of the most difficult areas to manage - filer systems.
The result of analysis of 20 filers is shown in Figure 1. The hardware cost of 20 filers with 800TB of storage together with maintenance is assumed to be $1.85 million. The potential net benefits of applying the technology are assessed by Wikibon to be about $2.55 million, in addition to the less tangible benefits of improved provenance, improved audit ability, and much improved ability to demonstrate compliance. The reference architecture for this study was Catalogic ECX software (a spin out from Syncsort), which only operates on NetApp equipment at the moment. Clearly supporting other storage platforms would give this technology broader appeal.
Wikibon recommends that Senior IT executives look in detail at catalog software as a strategic component of data management.
Methodology for Evaluation of Catalog Software
Wikibon looked at for areas of benefit for catalog software. These were:
- Improved Filer Storage & Management:
Catalog software helps to reduce the number of copies of data files and ensure a full history of when, where and by whom copies are made. This allows simpler compliance and provenance processes and procedures. Using the assumptions in Table 1 in the Footnotes, Wikibon calculated:
- The reduction in filer hardware & software acquisition costs;
- The reduction in operational costs for filer management.
- Improved Filer Backup & Management:
Catalog software also helps reduce the amount of data to be backed up. By reducing the amount of data that has to be processed through (for example) de-duplication appliances, the number of appliances and the operational costs of managing those appliances are reduced. Using the assumptions in Table 1 in the footnotes, Wikibon calculated:
- The reduction in filer hardware and software backup acquisition costs;
- The reduction in operational costs for filer backup and management.
- Improved Access to Data for Developers, QA & Testing:
The use of space-efficient snapshots allows key copies of operational data to be published for developers, QA and testing. Catalog software can ensure that the same latest copy of (say) a set of production files are consistent and the same data has been used by all the members of an application development team, including developers, quality assurance and formal testing. Previous Wikibon research has found that application development members devote 40% of their time to copy management of data. Using the assumptions in Table 1 in the Footnotes, Wikibon calculated:
- A decrease of development time to access timely and consistent data from 40% to 16% of time, leading to smaller development teams and faster time-to-value.
- Impact of Earlier Time-to-Value for Applications and Updates on Business User Productivity & Value to Business:
All applications reduce in efficiency over time and need new versions to adapt to changing business environments. In previous research, Wikibon has assessed that the efficiency of application end-users reduces by about 10%/year while they are using the application. If an application is used heavily (say 10% of the time), the overall impact on end-user productivity is 1%. The business benefit of such productivity improvements is more than just greater efficiency for the end-user. New versions of software bring better functionality to the end-user and to the business as a whole. Using the assumptions in Table 1, Wikibon calculated:
- The increase in productivity of 24% of end-user costs from earlier delivery of initial and update application functionality while they are using applications (assumed that end-users use applications 5% of the time, making an overall increase in productivity of 1.2%).
The results of these assumptions and calculations are shown in the next section.
Financial Benefits of Catalog Software
Figure 1 in the executive summary was derived from the data in Figure 2 below, as the difference between the three-year total costs with and without catalog software. Figure 2 is based on the assumptions in Table 1 in the Footnotes, using the methodology laid out in the previous section. The overall conclusion from Figure 2 is that not having a software catalog increases the total three-year Filer & Backup Costs by 36%.
Wikibon finds the benefit of catalog software increases with the number of filers. Figure 3 below shows the total cost of filers with and without catalog software, and the blue dotted line on the right-hand axis shows the increase percentage benefit as the filers increase from 1 to 15.
In Figure 4 the blue dotted line on the right-hand axis shows the increased percentage benefit as the filers increase from 1 to 50. At city filer, the additional cost is nearly 60%.
Catalog Software Conclusions
The result of this Wikibon analysis shows the potential benefits for storage catalog software to be very large, especially for the unstructured and semi-structured data typically found in storage filers. The greater the number of filers, the greater the potential benefit. At 20 filers, the total cost benefits already exceeds the hardware and maintenance costs of the filers. In addition to these are less tangible but strategically vital benefits of improved provenance, improved audit ability and much improved ability to demonstrate compliance.
Strategically it is equally important that all data, both within the organization and data held in the cloud or hybrid cloud, have the same catalog capabilities to monitor the use, copying and access to data. Catalog software needs to be an integral part of the hybrid cloud orchestration and automation strategy, and supported by all major software-led infrastructure initiates.
Action Item: Wikibon recommends that Senior IT executives look in detail at catalog software as a strategic component of data management.