Simple alert going into Pending unexpectedly

adamsdavid10 · October 17, 2023, 2:15pm

I have an alert set up to attempt to catch processes that fail to run. I’m trying to ensure that at least 2 processes ran within the past 5 minutes. I have this alert:

But i’m getting the alert transition into Pending unexpectedly. I need help making sense of what i’m seeing. It looks like it’s going into pending based on A and B values being 1…? i’m not sure where these numbers are coming from or what they represent.

jangaraj · October 17, 2023, 2:51pm

I guess you have non zero Pending period in your alert config:

Set it to 0, so no pending will be used.

adamsdavid10 · October 17, 2023, 3:05pm

I’m trying to understand why it’s going into pending in the first place. That doesn’t make sense to me. You can see in the graph that the green line(what the alert is based on) never falls below 2, which is my alert threshold value.

georgerobinson · October 17, 2023, 3:15pm

The reason is because the last value returned by your query (the value of the Reduce expression A) is 1, and 1 is below your threshold of 2.

adamsdavid10 · October 17, 2023, 3:17pm

What you’re saying makes sense but i don’t see that happening in my graph. Maybe i don’t understand the Reduce → Last function. I understand that to always look at the latest value. If you look at my graph though, the green line never falls below 10.

adamsdavid10 · October 17, 2023, 3:27pm

could it be because my time grouping is 5 minutes, which is the same as the length of time the alert is covering, now-5m to now?

georgerobinson · October 17, 2023, 3:38pm

Hard to know, we can’t see the rest of the query. My suggestion would be to run the query interactively (outside Grafana) and check manually what the latest value is.

jangaraj · October 17, 2023, 3:49pm

I would like to see state history for this alert.

adamsdavid10 · October 17, 2023, 3:55pm

jangaraj · October 17, 2023, 4:17pm

You can see that result from A is flapping between 1 and 4,5. It looks like your data in DB are “delayed” 5 min, so I would use last 10min time range for alert query, to avoid state when there are no data for last 5 minutes.

adamsdavid10 · October 17, 2023, 4:18pm

ok, i’ll give that a shot

adamsdavid10 · October 17, 2023, 6:02pm

@jangaraj nope, still regularly going into Pending

georgerobinson · October 17, 2023, 7:42pm

I wouldn’t use the graph as a reference, instead use state history as @jangaraj mentioned. The points on the graph are aligned to the start of the 5th minute, but the alert rule is not guaranteed to be evaluated at that exact second.

If I had to guess, the query isn’t correct when run in between 5 minute offsets (i.e. at 12:48) for example. Perhaps you could share the query?

adamsdavid10 · October 17, 2023, 7:57pm

sure, i can share some of it. I’m using sql to loop through databases, execute a query in each, and store those into a temporary table. The query that executes for each database has this bit on the where clause:

    and accessed_timeIn AT TIME ZONE ''Central Standard Time'' AT TIME ZONE ''UTC'' <= ''' + $__timeTo() + '''
    and accessed_timeIn AT TIME ZONE ''Central Standard Time'' AT TIME ZONE ''UTC'' >= ''' + $__timeFrom() + '''

Here is the final query that is behind the graph and alert:

SELECT
  $__timeGroup(accessed_timeIn, '5m') as time,
  count(*) as 'All Runs'
FROM
  #CombinedResults
group BY
  $__timeGroup(accessed_timeIn, '5m')
order by
  time;

jangaraj · October 17, 2023, 9:10pm

How did you configure Configure no data and error handling ?

adamsdavid10 · October 17, 2023, 9:18pm

jangaraj · October 17, 2023, 9:24pm

Use: Alert state if no data or all values are null: no data - this can be causing alerting. Make sure your query returns some not null data every time.

adamsdavid10 · October 17, 2023, 9:55pm

hmm, i’m not sure i want that. If i don’t get any data back, then i want to be alerted. That would mean the processes i’m trying to monitor aren’t being executed. I want to alert on that.

Also, if i was getting nulls, wouldn’t that show up in the alert’s state history?

jangaraj · October 17, 2023, 9:57pm

You want to investigate why alert is going to pending state. That’s a task.

Later you can configure alert based on your need of course.

georgerobinson · October 18, 2023, 10:26am

SELECT
  $__timeGroup(accessed_timeIn, '5m') as time,
  count(*) as 'All Runs'
FROM
  #CombinedResults
group BY
  $__timeGroup(accessed_timeIn, '5m')
order by
  time;

The rule is evaluated once per minute. That means the time ranges that are queried are (for example):

(09:00UTC, -5m) = 08:55 to 09:00
(09:01UTC, -5m) = 08:56 to 09:01
(09:02UTC, -5m) = 08:57 to 09:02
(09:03UTC, -5m) = 08:58 to 09:03
(09:04UTC, -5m) = 08:59 to 09:04
(09:05UTC, -5m) = 09:00 to 09:05

When are the processes expected to run? Are there potential windows here where just one process has run?

Topic		Replies	Views
Alerts: strange 'pending' state changes	0	1070	June 21, 2019
Delay in alert state transition With Infinity data source Grafana alerting , infinity-datasource	3	67	September 4, 2024
New Alerting continuously to pending state / NaN Alerting alerting , influxdb	0	1351	December 17, 2021
Alerts Problem / InfluxDB Alerting alerting	5	472	November 30, 2022
Alert condition occurring alert state is "pending" but return back to "Ok" state Alerting alerting	0	124	September 6, 2023

Simple alert going into Pending unexpectedly

Related topics