Error budgets your engineers won't quietly resent
SLOs fail when they're handed down from on high. Here's how to set error budgets the team actually believes in — and uses.

Error budgets are simple math and hard politics. The math: if your SLO is 99.9%, you have 0.1% to spend on failure. The politics: who decides the number, and what happens when it runs out. Get the politics wrong and the whole thing becomes theater.
Start from user pain, not uptime
A good SLO measures something a user would actually complain about. "The API is up" is useless if every request takes nine seconds. Measure the experience:
- Availability — fraction of requests that succeed.
- Latency — fraction served under a threshold users notice.
SLO: 99.5% of /checkout requests succeed in < 800ms over 28 days
Budget: 0.5% → ~3.6 hours of "fast enough" failures per 28 days
Let the budget drive decisions
The point of a budget is to make a tradeoff explicit:
Budget left over? Ship faster, take more risk. Budget burned? Freeze features and fix reliability. The number decides, not the loudest person in the incident review.
Set them with the team, not for the team
The fastest way to kill an SLO is to impose it. Run a short workshop: pick the one or two journeys that matter, agree on thresholds, and write down what the team will do when the budget burns. Ownership of the number is what makes people honor it.
Review monthly, adjust without shame
Your first SLO will be wrong — too tight or too loose. That's expected. Review the burn each month and move the threshold. An SLO you never revisit is just a dashboard nobody trusts.