Hello, I’m trying to replace an ancient grafana setup (backed by influxdb) with modern grafana (12.3) as a proof-of-concept. I’ve got grafana configured and able to query influxdb, and now I’m trying to build things to replace what exists now.
We have a lot of ephemeral nodes that create a dashboard (with alerts) for themselves when they are created, then remove the dashboard (and alerts) when they are shut down. This works well enough, but now that grafana alerting has decoupled alerts from dashboards, I’d like to use the multi-dimensional alerting to not have to do this. Instead of having 20 webservers coming online and each one creating a dashboard for itself that monitors webserver things, I’ll just have alerts that monitor all of the webservers and notifies if something is wrong with any, many, or all of them.
So far, so good. Where I’m getting hung up at is on teardown. When scaling down, I am missing how to tell grafana to not alert when a node goes missing. I can just turn off “No Data” alerts, but those are useful in the event that a server goes unresponsive due to OOM or IO issue or something similar.
The best thing I can think of is to have the grouped alerts as I’ve described them ignoring No Data conditions, and then have each node just setup a single alert (like cpu) to catch No Data conditions, and remove it on shutdown. This feels kludgy to me, so I am creating this post to see if there’s a more elegant way to handle this or if I’m just missing something obvious or just overthinking it.
Thanks in advance for any help!