I really love your product, and it generates a lot of value for my team and I. So first off, thanks for all your good work.
Our monitored system has multiple applications being monitored by grafana (graphite datasource) that have similar metrics.
After following torkelo’s advice here, I started managing alerts by use of wildcards, since most of the apps have common metrics to alert on.
Having metrics: domain.app1.Metric1, domain.app2.Metric1
so the query: domain.*.Metric1
and alert: when avg() of query(A, 1h, now) is above X
This post talks about alerting when applications are down, but it does’t work in our setup because setting “no data or null values” to “alerting” won’t trigger an alert, since app2 will make it so there is data.
I really want to avoid having application specific dashboards since that’s a mess that’s very hard to maintain.
Ideally, I would want grafana to keep track of applications it’s seen, and if one disappears for more than 1h, it will alert. Then an application that was purposefully killed could be manually removed from that list. Otherwise this makes sure all our apps are running.
Could you please give me your take on this issue?