Hello
For our workloads/apps running on Kubernetes we use Grafana agent to export metrics & logs to Grafana Cloud. We use both the grafana agent operator and custom resources such as GrafanaAgent, MetricsInstance, etc.
How do you manage micro-services alerting with IaC? What are the best practices? Only things I can find online are about Terraform or ClickOps.
We deploy our services with helm charts, and ideally I would like a way of easily add the alerting configuration in the helm chart, and in the cloud alerting have a global config stating:
check the app
label of the alert, and route the alert to the slack channel #alerts-<$app>
. Or team
to route alerts per team.
I had a previous experience of shipping PrometheusRules along each micro-services (to keep it close to the app. And on the alert, there was a label with the app name, and the team to alert). The Prometheus & AlertManager were deployed inside the cluster itself.
But with the grafana agent, PrometheusRules is not supported, and I don’t want to “self-host” anything as we use & pay for the Cloud offering.
How do you deal with alerting in this case?
-
Do you have a separate terraform config that handles only Grafana cloud alerting? (would like to keep it K8s native, to avoid split the tools, and keep the alert config close to the app)
-
Deploy your own AlertManager & Prometheus instance? (Then we loose the purpose of a managed Prometheus, and we don’t leverage cloud offering)
-
Only do ClickOps through the cloud UI? (I want to keep everything IaC, so it is not an option)
Am I missing a clear and obvious way of dealing with Alerting?