Service level agreements (SLAs) are a contractual agreement between providers and consumers of storage. SLAs govern performance, availability, recoverability, costs and other critical business metrics. It is widely accepted that the provision and consumption of application services should be governed by an explicit agreement. This is critical to defining the attributes of a service, for both parties (provider and the consumer).
Storage services make an important contribution to application services. This note is designed to address the following key issues:
- What are the advantages and disadvantages of establishing SLAs? When are they a good idea, and when should you wait?
- What are the alternative ways of implementing SLAs (e.g., storage measures response time by storage tier) vs. business measures (response time & outage by applications)
- What is the business impact of using SLAs and how can they be justified to management?
- What are good examples of effective SLA agreements?
- How should SLAs be measured and reported back to the business? Should the metrics be changed?
Today’s organization receives or delivers storage services both inside and outside of the organization. A key challenge is ensuring that these services are provided as expected with regards to timeliness, quality, and completeness of delivery, and that the rights and responsibilities of the service provider and recipient are clearly defined. A service level agreement aims to do thsi by clearly defining the expectations, rights, and responsibilities of all parties to the agreement. Penalties for non-compliance and for charging of service costs are normally also part of the SLA. The storage service level agreement is written in a specific manner, defining services and requirements for performance, customer support and problem resolution, change control, security, disaster handling, and other requirements of provider and recipient across all storage tiers (technical definition) and applications (business definition).
Contents |
Storage SLA capability
Most agreements with external service providers are subject to a contract, but standard contracts typically only specify the basic terms and conditions of the service, pricing for the service, and payment and dispute mechanisms. They do not describe in detail what the customer expects of the provider, and in turn what the provider expects from the customer, to ensure that everyone is satisfied with the service. Agreements with internal service providers are usually even less formal. There service level agreement fills this gap and can lead to better service by defining expectations and by imposing penalties for non-compliance – motivating each party to the agreement to keep to their end of the agreement.
Specific operational goals of storage service level agreements
Key operational goals of a storage service level agreement should include:
- Reducing storage costs, and cutting expenses arising from disputes with service providers.
- Receiving measurably better service in terms of timeliness of delivery, quality, and completeness of service.
- Enjoying measurably better and quicker customer service due to clear expectations on both sides.
- Reducing dependencies on key staff at the customer and provider to supply good service and resolve issues by documenting and agreeing key deliverables, expectations, rights, and responsibilities.
- Demonstrating good business and technology controls to auditors and regulators.
- Providing key performance indicators for use in a quality control or improvement initiative like Six Sigma.
- Gaining a verifiable basis for resolving disputes, and for exiting the service contract if required.
Risks of implementing Storage SLAs
Many organizations are reluctant to deploy storage SLAs internally or with service providers with whom they have had long standing relationships. The reasoning for the former is that to some it seems overly bureaucratic to draw up internal contracts; for the latter, there is concern that formalizing a long-standing relationship risks “insulting” the service provider and degrading the overall relationship. For internal SLAs, many organizations address this concern by avoiding objectionable penalty clauses, choosing instead to define and agree deliverables, expectations, rights, and responsibilities. For external service level agreements, organizations can assure their long-standing providers that there is no sudden loss of trust, but that this is an initiative to better increase controls which is increasingly accepted nowadays. Some customers that face substantial resistance have had success by “blaming” the need for an SLA on its auditors and regulators – something most businesspeople can understand.
The Storage SLA initiative
The business driver to initiate storage service level agreements may originate from end users of such services (concerned with business requirements for archiving, data storage, data warehouse design, etc.), from within an information technology department looking to better control service delivery and receipt, from a senior management initiative to improve service delivery or expense, or from auditors and regulators seeking a more controlled operational environment. Whatever the source, implementing an SLA is done by needs analysis, SLA design and documentation, and deployment / monitoring.
Expectations (Out-of-scope)
For a service level agreement to be written there must be a fairly stable understanding of the services to be provided and pricing for those services. Also, an SLA presupposes both the ability to measure service levels and the capability to monitor them. Therefore:
- The set of services to be delivered from provider to customer must be well-defined at a technical and business level, and there should be no material debate about the scope of services.
- Each service measurement criterion to be defined must be measurable by the provider and / or the customer.
- The customer must have the ability and readiness to monitor service levels and escalate disputes where necessary. An unmonitored agreement is likely to be ignored and will lose its validity.
Analyze phase
The scope of storage services should be defined; additionally, the place and manner of service delivery along with an explanation of how the customer uses these services should be described. Services may include the architecture, design, deployment, or operation of data storage and consolidation, replication, backup and recovery, data copy for user testing, security, monitoring, performance analysis, storage resilience design, NAS / SAN design, or storage area management solutions. Services may also include ancillary functions like customer support, technical support, provision of hardware / networks / software / staff, and so on. It should also encompass on-request or event-driven services where applicable like data migrations, data cleansing, disaster recovery and operational incident handling, request for changes and change management, scheduled maintenance, testing, and status reporting. A managed services environment may include provision of most or all of these services. At any rate, all providers and users of each service should be specified; aspects like customer support may involve multiple organizations at the service provider level and at the customer level.
Acceptance Test Considerations
The Analyze Phase is complete when the customer / service recipient is satisfied that the list of services is complete, and when the service provider agrees with the contents of this list. At times, the formulation of an SLA leads to the definition of new services into the master services contract where the service provider will not provide these in the normal course of business.
Key analysis milestones
Milestones in the Analyze Phase typically include the following:
- Review of current provider contract / agreement.
- Review of current provider services and deliverables.
- Input on scope of services from end users, management, information technology.
- End user input on future trends affecting the service.
- Agreement on who provides which aspects of each service, and who the end customers are.
- Staffing of an "SLA project" including a sponsor, project manager, and project team.
Design phase
Once service scope has been detailed, the next step is to define and agree service levels for each provided service. This can include metrics like input / output operations per second (IOPS), retrieval / access time from each storage media, transfer rates, replication frequencies, backup frequencies, maximum storage / processing loads, storage capacity, hours of operation, amount of storage per user, availability / uptime of each storage tier, time to respond to service problems, time to recover from disaster, time to restore to production from backups, time to restore data to test environments, and time to repair. Quality of service metrics like error rates should be specified as well. These can be defined at a granular level of detail for each applicable business operation or transaction type (longer IOPS / response times may be acceptable for some and not others). Growth of storage, breakdown of storage by tier or business usage, and cost per unit of storage are often important metrics in managed storage environments.
Next, roles and responsibilities of customers and providers need to be defined; remember to consider both day-to-day processes and less frequent incidents or processes (such as change management or year end processing). Regular and emergency contact information for key customer and provider personnel and departments should be included. Finally, any cost chargeback arrangements (from provider to customer, or internally between customer departments) should be clearly spelled out; if penalties for service level non-compliance will be implemented, these should be described too.
Acceptance test considerations
The Design Phase is complete when service levels and roles and responsibilities are defined and agreed between customer and provider for each service in scope. Each service level metric must be measurable somehow in order for it to be meaningfully monitored post-deployment. In addition, key contact information should be written into the SLA, as should any arrangements for cost chargebacks or financial penalties.
Key design milestones
Milestones in the Design Phase include the following:
- Measurable metrics are defined and agreed.
- Customer or provider demonstrates how each metric can be collected and monitored and by whom.
- Roles and responsibilities are defined and agreed.
- Key contact information for each service provider and affected customer organization is documented.
- Cost chargebacks are specified (when, how, and how much).
- Penalties are defined (triggers, penalty amount, and billing process).
- Processes for changing the SLA over time, and criteria for exit of the SLA, are established.
- Prior to deployment, service level agreements with external providers should be reviewed by the customer’s legal counsel as act as amendments to the existing service contract.
Deploy phase
There are generally three ways to deploy service level agreements. The first is to agree with the provider to simply monitor service levels and observe roles and responsibilities per the agreement for a period of time, without cost chargebacks or penalties. The purpose is to validate the effectiveness and efficiency of the SLA as written. After this trial period, the agreement can be modified by the parties. The second way is to “pilot” the agreement, which can mean only rolling it out for certain services or customer organizations, collecting feedback, making modifications, then rolling it out more fully. The third way of deploying an SLA is a “big bang” approach, where it simply goes into effect at an agreed date. This should only be used when there is a high level of confidence in the SLA, and where there is a good and established relationship between provider and customer (just in case!) Whichever method is chosen, customer and provider groups must activate their monitoring capability once the SLA is deployed.
Acceptance Test Considerations
The Deploy Phase has been successfully implemented when the service level agreement is being actively monitored – in whole or in part, per agreement between customer and provider – and it appears that the SLA is realistic and effective. For SLAs with chargeback and penalty provisions, deployment is final when these have been activated, and costs / penalties are being charged per the agreement.
Key deployment milestones
Milestones in the Deploy Phase include the following:
- Determine which deployment strategy will be used.
- Confirm staffing and readiness.
- Activate service level monitoring.
- Upon final acceptance of the process, activate chargebacks and penalties if used.
Initiative summary
Depending upon the maturity of the customer organization’s IT capability and staffing levels, and upon the willingness of the service provider to cooperate, the time to implement the SLA from analysis to deployment can range from a few weeks to several months. Some organizations choose to pilot the SLA for a longer period of time so as to collect data and assess the SLA over various production cycles (month end, quarter end, etc.) and ad-hoc events (data recovery, system upgrade, system change, etc.) If the customer does not have the staff or technology resources to monitor service levels and calculate chargebacks / penalties, there may be a need to redistribute work, increase staffing, or procure technology to assist with this and assure the success of the SLA deployment. In fact increased staffing and / or purchase of tools to monitor the storage SLA normally comprise the only costs associated with such an SLA. Many organizations are able to avoid these costs by assigning the monitoring of the SLA to the IT group already responsible for production environment monitoring, and by using tools part of the storage management software itself to do this monitoring.