Service Level Indicator

Be realistic

Sometimes only a proxy is available. For example, client-side latency is often the more user-relevant metric, but it might only be possible to measure latency at the server.

Common indicators

Request latency
Error rate / Availability (commonly expressed in the number of nines, e.g. 99% is “2 nines”)
System throughput
Data durability
Correctness (needless to say, but often not an SRE responsibility)

A few broad categories of services tend to find different SLIs relevant:

User-facing serving systems generally care about availability, latency and throughput.
Storage systems often emphasize latency, availability and durability.
Big data systems tend to care about throughput and end-to-end latency.

Aggregation

See Monitoring.

SLI templates

To save effort, build a set of reusable SLI templates for each common metric. Define the aggregation interval & regions, measurement frequency, scope, method of measurement, etc in the template.

🪴 Zero's Garden

Explorer

Service Level Indicator

Be realistic

Common indicators

Aggregation

SLI templates

Graph View

Table of Contents

Backlinks