In a companion alert (.pst file: The scourge of IT) I pointed out some of the significant benefits of using Volume Shadow Copy Service (VSS) as a basis for backing up Exchange databases. These include being able to take backups without disrupting service and supporting array-based snapshots to increase the number and speed of copies.
However, Microsoft’s VSS technology is young and has still has a ways to go to be operationally mature in large enterprises. One such example is the limit on concurrency with VSS, which have been shown in HP’s tests to be approximately 8 concurrent jobs. Higher levels than this increases the probability of VSS timeouts, which usually means backup failure, operator intervention, and restarting the backup stream. This is aggravated because VSS is dependent on data at a volume level. If concurrent jobs from databases or storage groups in the same partition are initiated together, this can also cause VSS snapshot creation timeouts and backup failures.
This particular limitation can be crudely overcome by offsetting the start times for VSS backups in scripts, but significant testing and operator training would be required to make this stable in a production environment.
Action Item: VSS is here to stay, and will be used increasingly as a key component in the Exchange eco-system. However, large installations will need to do significant operational stress testing of VSS to ensure that it not only works in normal situations but also in degraded situations. One practical strategy is to try to ensure that exchange setups are as close as possible to Microsoft’s own internal email service implementation. Problems from Microsoft’s internal customers seem to get resolved the quickest.
Footnotes: *Prince of Morocco: "All that glisters is not gold." The Merchant of Venice (II, vii) (Shakespeare)