For our workloads/apps running on Kubernetes we use Grafana agent to export metrics & logs to Grafana Cloud. We use both the grafana agent operator and custom resources such as GrafanaAgent, MetricsInstance, etc.
How do you manage micro-services alerting with IaC? What are the best practices? Only things I can find online are about Terraform or ClickOps.
We deploy our services with helm charts, and ideally I would like a way of easily add the alerting configuration in the helm chart, and in the cloud alerting have a global config stating:
app label of the alert, and route the alert to the slack channel
team to route alerts per team.
I had a previous experience of shipping PrometheusRules along each micro-services (to keep it close to the app. And on the alert, there was a label with the app name, and the team to alert). The Prometheus & AlertManager were deployed inside the cluster itself.
But with the grafana agent, PrometheusRules is not supported, and I don’t want to “self-host” anything as we use & pay for the Cloud offering.
How do you deal with alerting in this case?
Do you have a separate terraform config that handles only Grafana cloud alerting? (would like to keep it K8s native, to avoid split the tools, and keep the alert config close to the app)
Deploy your own AlertManager & Prometheus instance? (Then we loose the purpose of a managed Prometheus, and we don’t leverage cloud offering)
Only do ClickOps through the cloud UI? (I want to keep everything IaC, so it is not an option)
Am I missing a clear and obvious way of dealing with Alerting?