Having used Grafana for a few years, I really like most of the visualizations and generally it is pretty easy to work with. However after building a few pretty complex dashboards out with it, I keep running into the same problem. Alerting flat out sucks.
It seems as if there is no consistency as to which panels can or cannot generate alerts. On the panels that can generate alerts normal features such as Template variables are not supported. This leads to quite a bit of frustration.
Is Grafana not really ment to be used as an alerting platform? I think at the end of the day its a cool looking tool, that could be a home run if it could alert on every panel and have the ability to use template variables. I am sure there is something I am missing as well so this isnt a comprehensive list…
Two use cases, and perhaps this is the wrong way to use Grafana is as follows…
Have 100 servers, want to monitor iops and alert if they exceed X number with latency above X for 1 min. This should work on all servers configured in the dashboard. Metrics are pulled from prometheus, each server is a variable of $Server which the dashboard repeats on.
Have 100 servers, they have random disks for example one will have a C volume, another will have C and D. For all servers monitor the % utilization of the volumes and alert if they exceed a threshold.
Not trying to rant, just wanting to provide some feedback.