We have developed a model to evaluate the effectiveness of existing data reduction technologies. Our goal is to be able to rank ROI as defined by (marginal benefit)/(marginal cost). The model we're developing uses the concept of CORE, which stands for Capacity Optimization Ratio Effectiveness. It is a measure of the effectiveness of a storage optimization technology as a function of time and cost to achieve a desired capacity reduction. In this model, a user can trade efficiency of optimization for elapsed time and cost. In other words, it’s a useful way to measure where to more expensive reduction techniques will drive business value.
There are seven variables we consider in CORE:
- Storage Capacity (S) – by capacity, we mean the volume of data being optimized as measured by the size of the dataset used to define the optimization index. In general, the larger the dataset used to define the index, the more CPU horsepower will be required to optimize the data. For example, when Data Domain ingests a backup stream the ‘capacity’ could be measured in TBs; Falconstor and ProtecTier, increase the size of the dataset by using more processing power.
- Percent Capacity Reduction (R) - That is the percent of targeted capacity that is eliminated; which is a function of the type of data being optimized. For example, backup data can be more highly optimized than multimedia files. In general, the more capacity being optimized, the greater the percent capacity reduction.
- Cost of Solution (C_s) - Cost of data reduction solution to include processing power, memory, bandwidth, software, redundancy. (Note: for the first pass we've left off operational costs).
- Cost if Latency (C_i) - C_l is "cost of latency" and is a scaling value based on the assumed cost of a second of added latency.
- Value of Capacity Reduced (V) - Measured as the cost/TB X the amount of data reduced - i.e. the dollar value of storage that is eliminated.
- Workload IOs/second (W) - Workload characteristics (in IO/second)
- Optimization Overhead seconds/io - The optimization overhead of data optimization (in seconds/IO)
Here's the formula:
CORE = (S X R X V) ÷ (C_s + C_l * W * O)
We have already modified the formula based on peer feedback. We are looking for further peer feedback on the concept and will begin applying some examples for further review. We envision doing the following:
- Apply calculate the core for various technologies and workload types.
- Technologies we plan to cover include:
- Compression (post-process), Single Instancing, Data Deduplication (target), Data Deduplication (source), Compression (real-time)
- Workload types include:
- Archive, Backup, Primary
Action Item: testing
Footnotes:
