Multidimensional alert with multiple tresholds

Hi!

goal I’ve set up application metrics to ship with an ‘availability’ label. I’m planning to set up black-box monitoring for a series of applications, with different ‘availability’ levels.
To that end, I’m setting up some multidimensional alerts and am looking for the best way to go about this. I realize that even with blackbox monitoring, the norms, or thresholds for each application are going to be different. My plan is to use tresholds based on the ‘availability’ rating as a sensible default and allow overrides in some way (e.g. with an extra alertrule and nested notification policies to redirect the default to /dev/nullThis text will be hidden)

What I have
here’s an example with one threshold:

Query A: sum(increase(logback_events_total{level="error"}[10m])) by (app_kubernetes_io_name, company.com/availability, company.com/team)
Expression B: Reduce max A
Expression C: Math $B > 5
(a notification policy then redirects alerts to the right team, based on company.com/team)

One way to do it
Now, to go to multiple tresholds, I can add a filter on availability to query A (company.com/availability="2"), duplicate that query for each of 4 possible values, duplicate the expressions too and make the final expression something like $E > 10 || $F > 5 || $G > 3 || $H > 1.

This has a lot of duplication in it and if there is no application with that availability yet, the alert does not show correctly in the gui, because the preview can’t handle no-data situations.

Another way to do it
I could do the same, but in 4 different alerts, which doesn’t have that last problem, but has a lot of duplication as well.

What I’m looking for
What I would actually want is something like this (promQL style), in expression C:
$B{mycompany.com/availability="1"} > 10 || $B{mycompany.com/availability="2"} > 5 || $B{mycompany.com/availability="3"} > 3 || $B{mycompany.com/availability="4"} > 1
The above is invalid and as far as I can see, expressions can not operate on labels, but is there any other way to this?

Bonus question: Is there a way to store the thresholds seperate from the alert definitions, (such as we can do with constants in the dashboards or with prometheus recordrules), but in a way that it can be adjusted in the grafana gui?

1 Like

hi there,

can you please specify your grafana version? Alerting gets done of rapid development so this is really contingent on your version. I’m guessing you are using the new Unified Alerting platform and not the legacy platform :+1:

Hi!

Yes, it’s with the unified alert system. At the time of writing, the grafana version was 9.0.5.

It took me a while to realize that, if you start with an existing dashboard that got converted during the migration to grafana 9, it’s alert gets turned into a ‘legacy’ query. And you can’t create multidimensional alerts from those. By recreating the alert and using the new type, plus a reduce and math expression, I could make the query as described above.

I tried to make the post as short as possible with all the info, but by rereading, it might have been a bit too short and made it hard to read the goal.

The question is basically:

  1. most of my metrics have an “availability” label, which is numeric. 1-4.
  2. I would like the (unified) alert to only fire, if the availability label for the metric is higher than a constant or doesn’t exist. Preferably without having 4 different queries in each alert.

If that works, I’d like to improve it like so:

A) can I store the constant separate from the alert query?
B) some of my metrics have an “availability-override” label. Same range. If this label is set, then the query should compare against this number instead of the constant.