Roughly 70% of outages are due to changes in a live system.

Trio of practices

  • Implementing progressive rollouts
  • Quickly and accurately detecting problems
  • Rolling back changes safely when problems arise

Automation

Remove humans from the loop, use automation.

Mandatory Review

Velocity

Frequency of pushes should be guided by Error Budget.

Info

Push means any change to a service’s running software or its configuration.