Dynamic alerting thresholds in unified alerting

Dynamic alert thresholds could be achieved by joining the time-series query with data from another source (like a CSV or SQL table). That would produce some extra columns with thresholds that could be referred to in an alert expression.

I tried to do this in Grafana unified alerting but I found that it is basically impossible due to restrictions on the format of the data that is produced by the query (needs to be wide-format time-series data).
There are some exceptions mentioned here: Alerting on numeric data | Grafana documentation
But basically it seems impossible to do.

Next I will try to get this to work in Influx-DB alerting. Does anybody have a better idea?

Now this post has been unanswered for 21d now so I should probably add some info:

What do I mean by “dynamic alerting thresholds”?
In Grafana unified alerting you can set a threshold like: If value > 80 then alert
Grafana users can modify the threshold (80 in this case) to another value. So, in that sense the threshold is dynamic.
However, what I want is that the value “80” is read from another datasource like so:
If value > thresholdFromDatasource then alert.

1 Like

I actually found some interesting info in This webinar video. 23 mins into the video they show a method that introduces a dynamic threshold from another datasource. I did not try this yet, but I’m afraid I may run into trouble when using Influx-DB instead of MySQL queries.

1 Like

Turns out I needed to reorganize the schema of my InfluxDB bucket. Instead of having one measurement per “channel” I now have one measurement with one tag per “channel” of measured data. I’m still struggling with the math function that combines all the queries, but I think I will manage…

Just to join in on your lonely thread, where did you learn that you need to:

have one measurement with one tag per “channel” of measured data.

? I am curious to see your InfluxDB query and resulting alerts.

I am not using Dynamic Alerting Thresholds, but have struggled with Multi Dimensional Alerts. Am planning to do a writeup soon of my workaround and will post here.

Hi grant2,

Thanks for joining!
First I was under the impression that 1 measurement was equivalent to, say 1 temperature sensor. So for 100 sensors I would use 100 measurements. But from the InfluxDB sample data it became clear that that’s probably not the way to go. I held on to 1 measurement per sensor for some time but, decided to switch to 1 measurement for all sensors when I was faced with the dynamic alert thresholds challenge.

Tomorrow I’ll be in the office and will share some more Info on the queries that I used…

Meanwhile I’ve noticed a discrepancy between the period selection in the query editor for an alert and the alert-query runtime results. Instead of referring to the period selection (i.e. v.blabla) I now use “-1h to now()” as time period. What’s your experience with that?

I’m curious about your writeup! What’s a multidimensional alert?

Hi @kortenbach In a nutshell, here is how my InfluxDB (v2.3) is organized:

measurement: call-it-anything-that-makes-sense (in my case, “Single Chamber Furnace”)

fields: temperature, humidity

tags:

  • furnace_number: 2, 6, 13
  • measurement_type: actual, setpoint
  • zone: upper_chamber, lower_chamber

In the above case, you can see the measurement (at least what InfluxDB calls a measurement) is a generic term that includes:

  • two fields (but could be dozens)
  • three furnace numbers (but could be dozens)
  • two measurement types
  • two zones (but could be dozens)

When I grab a temperature reading and send to InfluxDB, I specify all the applicable tags, and (when applicable and when possible) use the exact same timestamp.

For example, if the temperature controller at the upper chamber of furnace 2 has a setpoint of 500 and the actual reading is 508, at a time of 13:40:33 (plus milliseconds). Then for the alert, I write the query as setpoint vs actual, and if it exceeds a threshold of, say, 10, then the alert fires.

If I move the setpoint to 750, the alert will fire until the actual is between 740 and 760.

So while I am not dynamically setting the threshold, I am using a query of my data (which contains the setpoint) so that I can simply alert off of the absolute value of the setpoint LESS actual value.

Does any of this apply to your situation?

@grand2,

It seems that our applications have a lot in common. Both deal with real world proces information. However, my data is unstructured. I only have a name and a value. All names and their historical values are (now) 1 measurement. I see no added value in structuring the data anymore than that (for now).
I made synoptic panels using a flash-like program called Wick editor. There I can display dynamic data and create input fields for setpoints. These panels are configured using configuration files that specify which Datapoint is connected to Which graphic elements.
I also made a system that enables me to switch between live and historical synoptic data (kinda like YouTube movie time slider).
So I’m building a SCADA-like application using Node-red as glue between the different applications.
My setpoints and alarm thresholds will be either fixed or time-dependent following a week scheme.