How to manage dashboard versions and ngAlerts in grafana 9?

  • What Grafana version and what operating system are you using?
    Grafana 9.0.0 in a container

  • What are you trying to achieve?
    I’m looking for the best practise to set up my monitoring & alerting, such that there’s a way to view the dashboard for each application and have alerts set up that I can send off to slack.

I think a fairly standard usecase for a software company. I’m having a bunch of applications (30 or so) deployed on k8s, that expose metrics to prometheus. (There’s one k8s cluster for dev/test/acc/prod and a separate prometheus and grafana instance too). I have a dashboard that I’d like to use for each of them. The dashboard has 20 or 30 panels and 9 alerts and is replicated in a DTAP env. Each application has its own alert channel. Some metrics come from scraping the application’s metrics endpoint and some come from kube-state-metrics. For the first, I filter data by service=x and the latter, I filter by either deployment=x or container=x. At this time there is no inhouse convention that helps me relate the 3.

  • How are you trying to achieve it? What Happened?
    Before Grafana 8.2, I could copy and paste a template dashboard, change some values and have a working dashboard plus alerts. After this version, copying the dashboard does not copy the alerts anymore. This thread lists some info and suggests a workaround via the grafana API.
    At the same time, there’s a bunch of places that suggest that ngAlerts supports templated variables in some way, that in theory allows us to have 1 dashboard+alert configuration, that services all 30 applications in that environment.

  • What are you looking for?
    I’m looking for a best practise. I see these paths:

    • I could replicate the dashboard 30 (applications) * 4 (environments) (=120 dashboards), which is a lot of work already. Having to recreate 9 alerts for each of those is just too much and invites errors.
    • I could ‘generify’ the dashboard, set a template variable for the application and… get stuck, because I need to correlate 3 variables somehow and that even the new alert system doesn’t seem to support those in alert queries. Although it seems to be possible these days to send the alerts to a different channel per application.
    • I could learn the Grafana API and create a system that provisions dashboards for those applications. (alerts don’t work via dashboard json files, so I can’t just generate and drop them somewhere). This sounds like the best option and has the added benefit of potentially storing the json in a code repository, but is also yet another system to maintain. And it takes a while before I have any results.
    • some other way? Grafana 9 was probably made with a certain strategy in mind for this. Which?

the new alerting platform decouples alerts from dashboards, so can you clarify: how old are these alerts? Are they using the legacy alerting platform?

Also, here are the newest stable API endpoints for provisioning Grafana alerts:

The original alerts were made in grafana 7 or so and automatically converted to the new system. They (and also the datasources) also automatically got an id, which seems to interfere with the process of copying a dashboard to either the same or another environment. I guess I can try manually erasing those before copying the json and see if the dashboard will then come up with alerts (on the same instance) and dashboards (on a different grafana instance).

It’s just, it feels like I’m trying to figure stuff out that should be public knowledge and I don’t see it anywhere. Feels wrong.

Thanks for the openapi specs. that helps a lot if I would go with the api route. Is that what you would advise?

What I would really like to see here is to have alerts provisioned as we provision dashboards and datasources, with files dropped into provisioning/alerts manually or with a sidecar from Kubernetes configmaps.