SRE

What is an Error Budget?

Difficulty: unrated

Source: bregman-arie/devops-exercises by Arie Bregman

Answer

An Error Budget represents the acceptable amount of downtime or errors a service can experience while still meeting its SLO.

An error budget is 1 minus the SLO of the service. A 99.9% SLO service has a 0.1% error budget.

If our service receives 1,000,000 requests in four weeks, a 99.9% availability SLO gives us a budget of 1,000 errors over that period.

The error budget is a mechanism for balancing innovation and stability. If the SRE cannot enforce the error budget, the whole system breaks down.

Read more: Google SRE Handbook