What is a SLA?
The SLA (Service Level Agreement) is a contract which is intended to define the commitments of service levels made by a service provider. It is therefore concluded between the publisher and the host, but also between the publisher and the end customer.
This document is essential as soon as the customer depends on the continuity of the publisher and the host services to be able to exercise his activity, in particular within the framework of a SAAS software offer.
Indeed, the duration of unavailability of software can be extremely harmful for a company. Hence the need to strictly supervise downtime due to breakdowns or maintenance, during which the end customer can no longer access its management system.
What is a SLA for?
SLA aims to define the publisher’s commitments concerning the availability and performance of the software offered, but also maintenance and support. Concretely, he must describe how the service provider plans to assist his client, in what perimeter and within what deadlines. Most often, the intervention methods vary according to the level of criticality of the incident or the technical problem encountered.
This document is also a guarantee of transparency, seriousness and credibility for the publisher. He can thus establish a real relationship of trust with his client throughout the duration of the contract, while warning possible disputes.
What is the content of a SLA?
The content of the level of service agreement varies depending on the type of software and the services provided. However, it must include a certain number of essential information:
- The nature of the services provided and their extent
- Service levels and performance indicators to measure them
- The criteria used to measure indicators
- Information methods in the event of an incident
- Monitoring and reporting methods
- The Calendar of Maintenance Interventions
In any case, it is essential to write a SLA perfectly meeting the needs and expectations of the customer, without overestimating the capacities of the publisher and the host.
What are the main components of a SLA?
In the case of software Cloud hostedSLA makes it possible to specify a certain number of commitments, of which here are the most important.
PSG (guaranteed service range)
PSG defines the period during which the service is guaranteed and availableFor example :
- 5 days a week or 7 days a week.
- From 8 a.m. to 6 p.m., 7 a.m. to 8 p.m. or 24 hours a day.
In other words, the guaranteed service range is the duration during which the publisher and the host undertake to render the service available and, if necessary, to restore it as soon as possible (as planned in the recovery time guarantee, which we will then discuss).
The choice of PSG depends largely on the use of software or service. For example :
- A general public website must be accessible 7 days a week on extended schedules (or even 24 hours a day).
- On the other hand, a software package used by a company staff during office hours probably does not need such a service range: a PSG from 8 a.m. to 6 p.m., for example, may be enough.
Note: In the majority of cases, the service is not interrupted outside the guaranteed service range, which means that the software solution can always be used. However, during this period, The publisher reserves the right to carry out maintenance operations and is not obliged to immediately restore the service in the event of a breakdown.
Finally, PSG makes it possible to calculate an annual duration which will notably serve to assess the availability rate. For example, with a service beach from 9 a.m. to 7 p.m. in 5 days a week, about 2,600 hours per year are obtained
GTR (recovery time guarantee)
The recovery time guarantee is the period within which the service provider undertakes to intervene to restore the servicefollowing an incident or a cut. This is a maximum duration, which means that the publisher can intervene within a shorter period, but never longer. Finally, the GTR intervenes only during the guaranteed service range, which leads to some complications.
Take the example of a 3 hour GTR and a PSG from 9 a.m. to 7 p.m., 5 days a week
- If a breakdown occurs on Monday at 9 a.m., it will be corrected the same day before noon.
- If an incident occurs on Wednesday at 6.30 p.m., the service provider will have until Thursday, 11:30 a.m., to intervene. Indeed, PSG is interrupted between 7 p.m. and 9 a.m. the next day.
- Finally, if a problem manifests during the weekend, it will only be resolved on Monday between 9 a.m. and 12 p.m., because Saturday and Sunday are not included in the guaranteed service range.
So, The effectiveness of the GTR depends largely on PSG : This is why they must be defined together, depending on the service provided and the needs of the company.
If it is a critical application or a commercial website, for example, a PSG in office hours represents a risk. In the event of a breakdown on Friday evening, the service is likely to remain unavailable all weekend, regardless of the GTR deadline.
The availability rate
In general, availability designates the probability that a computer system is able to function properly at a specific time. Thus, the higher the availability rate of an online service, the more stable the service.
Calculated annually, quarterly or even monthly, the availability rate makes it possible to define the maximum unavailability duration compared to the guaranteed service range. For example :
- With a PSG of 2,600 hours per year and an availability rate of 99 %, the annual unavailability duration will not exceed 26 hours.
- With a PSG 24/7 of 8,760 hours per year and an availability rate of 99 %, the annual unavailability period will not exceed 87 hours.
Thus, the extent of the guaranteed service range has a direct impact on the maximum unavailability duration. This is why it is important to put into perspective the meaning of the availability rate according to PSG.
RPO (Recovery Objective Point)
The term RPO, which can be translated by “catering objective”, designates the Maximum data recording duration that a company can accept to lose without it compromising its activity. This data loss can be due to a breakdown, a disaster or any other incident.
To quantify this restoration objective, it is necessary to assess the volume of the data produced, their level of importance, but also the backup windows allowing the data to be replied to external units. There duration of the RPO is therefore linked to the time required to make the backupto which must be added the restart duration of systems and data reloading.
In most cases, the RPO is on D-1, with a backup made every night. But this period can be reduced to half a day or less, which can be useful for systems dealing with a high volume of transactions.
Ultimately, the RPO depends on the activity, the size of the company and its use of the data. According to these criteria, we can use more or less expensive replication and backup techniques:
- A snapshot makes it possible to return to an earlier state of an environment, for example in the event of poor manipulation.
- A backup, on the other hand, is an image made at regular intervals and hosted on a separate storage space. In the event of an incident on a physical server, the backup allows you to restore the data, while a snapshot will be lost.
GTI (intervention time guarantee)
The intervention time guarantee is the period within which the publisher undertakes to take charge of an incident or a customer request. However, “intervention” is not synonymous with “recovery”: in fact, the GTI does not engage the service provider.
Only the guarantee of recovery time (GTR) can be considered as a real guarantee of security, forcing the service provider to quickly solve the problem encountered.