Best practices for alerting with Kubernetes and Grafana Cloud LGTM stack

francois0ba7 · October 7, 2023, 10:05am

Hello

For our workloads/apps running on Kubernetes we use Grafana agent to export metrics & logs to Grafana Cloud. We use both the grafana agent operator and custom resources such as GrafanaAgent, MetricsInstance, etc.

How do you manage micro-services alerting with IaC? What are the best practices? Only things I can find online are about Terraform or ClickOps.

We deploy our services with helm charts, and ideally I would like a way of easily add the alerting configuration in the helm chart, and in the cloud alerting have a global config stating:
check the app label of the alert, and route the alert to the slack channel #alerts-<$app>. Or team to route alerts per team.

I had a previous experience of shipping PrometheusRules along each micro-services (to keep it close to the app. And on the alert, there was a label with the app name, and the team to alert). The Prometheus & AlertManager were deployed inside the cluster itself.

But with the grafana agent, PrometheusRules is not supported, and I don’t want to “self-host” anything as we use & pay for the Cloud offering.

How do you deal with alerting in this case?

Do you have a separate terraform config that handles only Grafana cloud alerting? (would like to keep it K8s native, to avoid split the tools, and keep the alert config close to the app)
Deploy your own AlertManager & Prometheus instance? (Then we loose the purpose of a managed Prometheus, and we don’t leverage cloud offering)
Only do ClickOps through the cloud UI? (I want to keep everything IaC, so it is not an option)

Am I missing a clear and obvious way of dealing with Alerting?

theSuess · October 16, 2023, 12:12pm

Ahoi!

Prometheus/Mimir evaluates the rules server-side. As we cannot “look into” your cluster, these resources are not supported by the deployment structure you described.

The only way to install recording/alerting rules into Grafana Cloud is by talking to the ruler api directly. As you have correctly discovered, this can be done with Grafana or Terraform. Both don’t work well for your use case as you’ve already explained.

But there is another way! Using mimirtool you can interact with the ruler by specifying your rule definitions as YAML files. That way, you can keep all the code in one place and don’t have to onboard people to use terraform. To apply this, the only additional step is to add the mimirtool rules sync command to your deployment pipeline.

Here’s more cloud specific documentation on this topic: https://grafana.com/docs/grafana/v10.0/alerting/set-up/set-up-cloud/

Hope this helps!

francois0ba7 · October 19, 2023, 1:50pm

Hey,

Thanks for the detailed answer!

Do you know if in a near future, the grafana agent in flow mode would be able to use PrometheusRule from the prometheus CRDs and upload them to the ruler directly?

theSuess · October 19, 2023, 1:52pm

As the agent is mostly focused on telemetry data, support for this usecase won’t be implemented there.

We’re currently looking into offering a way to support this use case, but it’ll probably be a separate service/operator taking care of this - I’ll keep you posted!

francois0ba7 · October 25, 2023, 8:50am

Thanks, looking forward to it

Topic		Replies	Views
Do Grafana Alert Support declare(or iac) way Alerting	1	640	August 11, 2022
Alerting Support for multiple Hosts and Series Configuration alerting	6	3713	November 11, 2021
Guys i am unable to set the alerts in grafana to monitor the kubernetes cluster . Can anyone please support me Alerting alerting , kubernetes , prometheus	0	389	June 15, 2022
Import Prometheus Rules to Grafana(Enterprise) Managed Alerts Alerting	0	346	June 9, 2023
Can Grafana send alerts for non-Grafana rules? Alerting alerting	1	15	July 19, 2024

Best practices for alerting with Kubernetes and Grafana Cloud LGTM stack

Related topics