Grafana Alerting issue in HA mode when Alert gets updated

Hi there :wave:

I have a question regarding Grafana Alerting behavior in HA mode :innocent:

Nominal usage

  • we have 2 Grafana instances with their Alert-managers peered
  • An alert goes into the “firing” state
    • We can see in the Alert History State transition related to both instances
    • In our ContactPoint backend, we see only 1 Alert, so Alert-managers deduplication works as expected :tada:

:white_check_mark:

Issue

However, when we update the Alert definition (from 1 Grafana instance given the load-balancer),

  • The Alert state of this instance is reset to the “Normal” state: :ok:
  • The alert in our ContactPoint backend gets closed: :ok:
  • BUT a new Alert with the same name is created in our ContactPoint backend again a few seconds/minutes later

Hypothesis

  • The second Grafana instance still has the previous Alert definition and creates a new Alert in our ContactPoint backend …
  • Then, at the next evaluation interval, the Grafana has the new Alert definition and sends a “Close” event as expected

The issue is that the evaluation interval can be several hours depending on the Alert :sweat_smile:

→ This means the team gets notified 2x and the alert stays opened (in fact gets re-created) during the whole Evaluation interval despite having been updated …

Questions

:question: Does anyone face the same issue?
If so, have you done something to address it?

Many thanks for your feedback :pray:

Did you verify ha setup?

Hello @jangaraj ,

Yes we did (or did I miss something?) and the HA regarding alerting works, as detailed in the initial post :smile:

The issue we faced is when an alert definition gets updated :confused:

Hello there :wave:

Does anyone have a suggestion about that?

Would adding an external Alert-Manager help?
(As mentioned here: Configure high availability | Grafana documentation)

resetting all alert instances to Normal is expected after you changed the rule. In other words, if a rule that has instances in “Alerting” state gets updated, all those instances are resolved. Then at the next evaluation, the result produces instances in Alerting state, and you get the notification about it again.

Depending on what version you use, there can be fields that are ignored, i.e. any changes of those fields do not cause the state to be reset.