Dynamic alerting thresholds in unified alerting

Dynamic alert thresholds could be achieved by joining the time-series query with data from another source (like a CSV or SQL table). That would produce some extra columns with thresholds that could be referred to in an alert expression.

I tried to do this in Grafana unified alerting but I found that it is basically impossible due to restrictions on the format of the data that is produced by the query (needs to be wide-format time-series data).
There are some exceptions mentioned here: Alerting on numeric data | Grafana documentation
But basically it seems impossible to do.

Next I will try to get this to work in Influx-DB alerting. Does anybody have a better idea?

Now this post has been unanswered for 21d now so I should probably add some info:

What do I mean by “dynamic alerting thresholds”?
In Grafana unified alerting you can set a threshold like: If value > 80 then alert
Grafana users can modify the threshold (80 in this case) to another value. So, in that sense the threshold is dynamic.
However, what I want is that the value “80” is read from another datasource like so:
If value > thresholdFromDatasource then alert.

1 Like

I actually found some interesting info in This webinar video. 23 mins into the video they show a method that introduces a dynamic threshold from another datasource. I did not try this yet, but I’m afraid I may run into trouble when using Influx-DB instead of MySQL queries.

1 Like

Turns out I needed to reorganize the schema of my InfluxDB bucket. Instead of having one measurement per “channel” I now have one measurement with one tag per “channel” of measured data. I’m still struggling with the math function that combines all the queries, but I think I will manage…

Just to join in on your lonely thread, where did you learn that you need to:

have one measurement with one tag per “channel” of measured data.

? I am curious to see your InfluxDB query and resulting alerts.

I am not using Dynamic Alerting Thresholds, but have struggled with Multi Dimensional Alerts. Am planning to do a writeup soon of my workaround and will post here.

Hi grant2,

Thanks for joining!
First I was under the impression that 1 measurement was equivalent to, say 1 temperature sensor. So for 100 sensors I would use 100 measurements. But from the InfluxDB sample data it became clear that that’s probably not the way to go. I held on to 1 measurement per sensor for some time but, decided to switch to 1 measurement for all sensors when I was faced with the dynamic alert thresholds challenge.

Tomorrow I’ll be in the office and will share some more Info on the queries that I used…

Meanwhile I’ve noticed a discrepancy between the period selection in the query editor for an alert and the alert-query runtime results. Instead of referring to the period selection (i.e. v.blabla) I now use “-1h to now()” as time period. What’s your experience with that?

I’m curious about your writeup! What’s a multidimensional alert?

Hi @kortenbach In a nutshell, here is how my InfluxDB (v2.3) is organized:

measurement: call-it-anything-that-makes-sense (in my case, “Single Chamber Furnace”)

fields: temperature, humidity

tags:

  • furnace_number: 2, 6, 13
  • measurement_type: actual, setpoint
  • zone: upper_chamber, lower_chamber

In the above case, you can see the measurement (at least what InfluxDB calls a measurement) is a generic term that includes:

  • two fields (but could be dozens)
  • three furnace numbers (but could be dozens)
  • two measurement types
  • two zones (but could be dozens)

When I grab a temperature reading and send to InfluxDB, I specify all the applicable tags, and (when applicable and when possible) use the exact same timestamp.

For example, if the temperature controller at the upper chamber of furnace 2 has a setpoint of 500 and the actual reading is 508, at a time of 13:40:33 (plus milliseconds). Then for the alert, I write the query as setpoint vs actual, and if it exceeds a threshold of, say, 10, then the alert fires.

If I move the setpoint to 750, the alert will fire until the actual is between 740 and 760.

So while I am not dynamically setting the threshold, I am using a query of my data (which contains the setpoint) so that I can simply alert off of the absolute value of the setpoint LESS actual value.

Does any of this apply to your situation?

@grand2,

It seems that our applications have a lot in common. Both deal with real world proces information. However, my data is unstructured. I only have a name and a value. All names and their historical values are (now) 1 measurement. I see no added value in structuring the data anymore than that (for now).
I made synoptic panels using a flash-like program called Wick editor. There I can display dynamic data and create input fields for setpoints. These panels are configured using configuration files that specify which Datapoint is connected to Which graphic elements.
I also made a system that enables me to switch between live and historical synoptic data (kinda like YouTube movie time slider).
So I’m building a SCADA-like application using Node-red as glue between the different applications.
My setpoints and alarm thresholds will be either fixed or time-dependent following a week scheme.

I know that this is an old post, but I managed to do the variable thresholds using new alerting and InfluxDB. next step for me will be using data manipulation plugin to have an Input for each Individual threshold on the same dashboard that monitors the alerts, so the user can tweak the threshold if needed.


The trick was to have the same “schema” on both sources, since right now I don’t have the values on a database I used array.form() to create a table based on an array, but as long as I can manipulate the data to have the same schema (basically pivot() and group() functions using Influx DB), I believe it is possible using any other data source.

Same here, I found Node-red very useful to do all the logic stuff that is not possible to do either on Grafana or InfluxDB.

2 Likes

Thanks for sharing @fercasjr

That’s a brilliant solution.

You need to post this solution in a grafana post!

1 Like

Thanks for sharing! Looks like a nice way to do it!