Alert went back to OK even though it shouldn't have

Hi,

I had a case where an alert went back to OK when in fact the value was still above the threshold value. This resulted in two alerts for the same thing.

image

Any idea why this happened? It should not have gone back to OK since it was still above 0.

hi @tlindqvist :wave:

could you share the details of the two alerts received?

so far, there is a expected OK status around 00.05 hours, when the value was not above 0

is it possible that the alert is part of a group and so you are receiving the same OK twice because of that?

Hi @antonio :wave:

Thanks for your reply.

The alert rule belongs to a group with the same name as the alert, which got set up automatically during migration from legacy to unified, so there is only one alert rule in that group.
The group gets evaluated every 1m and the alert rule waits for the “for” duration of 5m.

Here is the state history:

How it could change to normal state at 2023-06-27 23:10:08 is what we don’t understand.

We have a HA setup with two Grafana nodes. Could that be a factor somehow?

We see more cases of this behavior. Does anybody have any idea what could be causing this?

It was a timing issue, where one value came in before another, so a 0 came in before a 1 for example, clearing the threshold. Grouping by time 1m does not show this in the graph. I think we need to increase group by to 2m or change the way the alert is set up. Thanks for your time.

1 Like