Definition

Toil is the kind of work tied to running a production service that tends to be manual, repetitive, automatable, tactical, devoid of enduring value, and that scales linearly as a service grows.

Balance with Engineering Work

Engineering work is novel and intrinsically requires human judgment. It produces a permanent improvement in your service, and is guided by a strategy. It is frequently creative and innovative, taking a design-driven approach to solving a problem—the more generalized, the better. It helps your team or the SRE organization handle a large service, or more services, with the same level of staffing.

Typical SRE activities involve

  • Software engineering
    • Involves writing or modifying code, in addition to any associated design and documentation work.
  • Systems engineering
    • Involves configuring production systems, modifying configurations, or documenting systems in a way that produces lasting improvements from a one-time effort.
    • Examples include monitoring setup and updates, load balancing configuration, server configuration, tuning of OS parameters, and load balancer setup.
    • Also includes consulting on architecture, design and productionization for developer teams.
  • Toil
  • Overhead: Administrative work not tied directly to running a service.

Toil tends to be spiky, so a steady 50% may not be realistic.