How we improved on-call life by reducing pager noise
To monitor the health of GitLab.com we use multiple SLIs for each service. We then page the on-call when one of these SLIs is not meeting our internal SLOs and burning through the error budget with the hopes of fixing the problem before too many of our users even notice. All of our services SLIs […]