#memeconnect #emc
Contents |
Tables Included
Below are two tables which show backup performance in two separate environments:
- Symantec OST environment using Ethernet and offloading some de-duplication processing to a media server front-end,
- Traditional VTL environment.
The reason for splitting out these environments is that the two worlds are different. Users are unlikely to make a forklift replacement of their existing backup software environment to meet de-duplication requirements, as the cost, risk and disruption of such change is very high. Therefore the choice of table shows the options more clearly within the appropriate software environment.
Metrics Used
The figures published are maximum vendor claims, undoubtedly with optimum configurations and tuning. The vendor figures all use TB to mean 109 bytes. Gilbert’s immortal line “She may very well pass for forty-three in the dusk with a light behind her" applies to all the figures in the tables below. The metrics included are:
- Inline Backup Speed (TB/hr) – for solutions that have a single inline step,
- Post-process Ingest Speed (TB/hr) – for solutions that ingest at a higher speed first, and then de-duplicate,
- Post-Process De-duplication Speed (TB/hr) – for solutions that have a prior ingest phase,
- Daily Backup Capacity (TB/Day) – the hourly de-duplication speed in TB/hour times 24 hours,
- Raw Usable Capacity (TB) – the figure before protection, formatting and other subtractions, and does not include the impact of de-duplication.
Global De-duplication
Global de-duplication allows a single directory to be shared across two or more nodes and allows the system to load balance data de-duplication across the nodes. For single-nodes systems, only the single-node performance is given, as even though many systems are stacked in a rack, they act and have to be managed separately. The backup sources has to be pointed at the specific box target, or the directory is lost.
The Comparison Tables
No single metric can describe the performance of de-duplication solutions for all backup products, for all workloads, and for all customer environments. The two main groups of software (Symantec/OST and VTL) have been separated out. The post-process solutions have an additional step that ingests the data, and their argument is that the backup window is the critical factor in deciding the backup window and that there is additional time between backup windows to do the de-duplication stage. For some situations this will be valid. However, not completing the total backup process during the backup window changes the RPO and RTO that the solution can achieve. On balance, it would appear that the simplest comparison metric is the amount of data that can be de-duplicated in a set period, set at a one day (24 hours). This is the metric first suggested and used by Curtis Preston in Backup Central Blog. Each table is sorted by daily backup capacity, which is bolded.

Source: Wikibon 2011. Format & original metrics taken and modified from an original table created by Curtis Preston (BackupCentral), dowloaded 1/18/2011 from http://www.backupcentral.com/mr-backup-blog-mainmenu-47/13-mr-backup-blog/348-tar.

Source: Wikibon 2011. Format & metrics taken and modified from an original table created by Curtis Preston (BackupCentral), dowloaded 1/18/2011 from http://www.backupcentral.com/mr-backup-blog-mainmenu-47/13-mr-backup-blog/348-tar.
Conclusions
Figure 1 shows the top four Symantec/OST de-duplication performers, and the top two VTL de-duplication performers. NEC is the leader by far, nearly four times faster than second place using a 55-node global de-duplication system, even though the system is not well known in the USA and occupies a massive 11 frames! In second place is the newly announced EMC DD890 GDA, with the Symantec NetBackup 5000 close behind. The top three positions in the Symantec/OST group were all inline de-duplication solutions. The first pot-processing solution was the newly announced Sepaton 8 node system in forth position.
n first position in the VTL table and fifth position overall was the same Sepaton 8 node system running in VTL mode. This was twice as fast as the EMC DD 890 GDA running in VTL mode, which had a very poor 2 node/1 node ratio of only 1.32.
Action Item: The NEC system should be looked at by large installations that have very aggressive RPO and RTO requirements and tight backup windows. For Symatec/OST environments, the EMC DD890 is a clear leader, although the newly announced Sepaton 8-node system could be a consideration in environments with tight backup windows and looser RPO/RTO requirements. For VTL environments, the newly announced Sepaton 8-node system is a clear leader.
Footnotes: Data from Tables 1 and 2 used in EMC Data Domain De-duplication 2011 Wikibon professional alert
IBM ProtectTier TS7650G data updated in Table 2 and added to Table 1.