Best practices for an alerting dashboard supporting hundreds of hosts


I’ve been making dashboards for individual hosts and they’re coming out great, but as Grafana does not yet support alerting on templated variables (it sounds like a hard problem to solve), whats the best way to go about making a panel that alerts on, say, high disk usage across all volumes on all hosts?

My naive approach is to make a big graph panel with my query, in this case (Influxdb):

SELECT mean("used_percent") FROM "disk" WHERE $timeFilter GROUP BY time($__interval), "host", "path" fill(null)

And then alert on when its greater than 75%. This works, but the visualization is pretty dense for anything over 20 hosts with 2 volumes each and it seems like this might be a bit expensive to run. Is there a better way to do this that scales better? Thanks!

Try prometheus+alertmanager connected to grafana. Maybe it helps you.