Alerting when applications or services go down in multivalue queries

Hi,
I really love your product, and it generates a lot of value for my team and I. So first off, thanks for all your good work.

Our monitored system has multiple applications being monitored by grafana (graphite datasource) that have similar metrics.
After following torkelo’s advice here, I started managing alerts by use of wildcards, since most of the apps have common metrics to alert on.

For example,
Having metrics: domain.app1.Metric1, domain.app2.Metric1
so the query: domain.*.Metric1
and alert: when avg() of query(A, 1h, now) is above X

This post talks about alerting when applications are down, but it does’t work in our setup because setting “no data or null values” to “alerting” won’t trigger an alert, since app2 will make it so there is data.

I really want to avoid having application specific dashboards since that’s a mess that’s very hard to maintain.
Ideally, I would want grafana to keep track of applications it’s seen, and if one disappears for more than 1h, it will alert. Then an application that was purposefully killed could be manually removed from that list. Otherwise this makes sure all our apps are running.

Could you please give me your take on this issue?
Thanks

A condtion with “HAS NO VALUE” might work for wildcard queries:

image

Hopefully, we will be implementing this feature in the near future which will make alerting per serie a lot easier:

Thanks for the response!

Right now I have it set up just like that, but sadly so long as there is at least one metric, the “HAS NO VALUE” alert does not fire.

Is there some other configuration, or even alerting paradigm (other than app-specific alerts) that will make it possible to alert when an application goes down?

Thanks

This is definitely a case we want to support in Grafana in the future.

To fix your issue, I think you might have to try a few different Graphite queries to get what you want. Maybe you could use the CountSeries function or the alert condition “when count()” in combination with the removeBelowValue function?

Has anyone found a way to do this? the closest I have got is to do this:

> SELECT COUNT(*) FROM (SELECT mean("value") FROM "measurement" WHERE time >= now()-1m GROUP BY "tag" fill(null))
name: value
time count_value
---- -----------
0    39052

Then planning to either integrate or differentiate (need to remember my highschool math first) and then alert if there is a gradient but as there is no time value it does not work.