Principles
- Ask questions until you can fix, instead of the commonly said “Fix first, ask questions later.”
Playbooks
Humans add latency, so thinking through and recording the best practices ahead of time in a “playbook”. It’s also a substitute for smart engineers able to think on the fly, and helps response to a high-stakes or time-sensitive page.
Runbooks
A runbook is a set of instructions for completing a routine task.
Runbook Discoverability
A runbook template should include a section at the top describing the intent of the runbook in one sentence.
Examples
- https://chrisphillips-cminion.github.io/day2-ops/2021/11/08/RunBook.html (playbook)
- https://www.transposit.com/devops-blog/devops/create-runbook-template-devops/ (runbook)
References
- The SRE Book
- https://response.pagerduty.com/
- https://handbook.gitlab.com/handbook/engineering/infrastructure/incident-management/
- https://www.transposit.com/devops-blog/devops/runbooks-playbooks-sops/
- https://www.transposit.com/devops-blog/sre/2020.01.30-writing-runbook-documentation-when-youre-an-sre/
- https://blog.danslimmon.com/2024/05/15/ask-questions-first-shoot-later/