Roughly 70% of outages are due to changes in a live system.
Trio of practices
- Implementing progressive rollouts
- Quickly and accurately detecting problems
- Rolling back changes safely when problems arise
Automation
Remove humans from the loop, use automation.
Mandatory Review
Velocity
Frequency of pushes should be guided by Error Budget.
Info
Push means any change to a service’s running software or its configuration.