Hi there
I have a question regarding Grafana Alerting behavior in HA mode
Nominal usage
- we have 2 Grafana instances with their Alert-managers peered
- An alert goes into the “firing” state
- We can see in the Alert History State transition related to both instances
- In our ContactPoint backend, we see only 1 Alert, so Alert-managers deduplication works as expected
→
Issue
However, when we update the Alert definition (from 1 Grafana instance given the load-balancer),
- The Alert state of this instance is reset to the “Normal” state:
- The alert in our ContactPoint backend gets closed:
- BUT a new Alert with the same name is created in our ContactPoint backend again a few seconds/minutes later…
Hypothesis
- The second Grafana instance still has the previous Alert definition and creates a new Alert in our ContactPoint backend …
- Then, at the next evaluation interval, the Grafana has the new Alert definition and sends a “Close” event as expected
The issue is that the evaluation interval can be several hours depending on the Alert
→ This means the team gets notified 2x and the alert stays opened (in fact gets re-created) during the whole Evaluation interval despite having been updated …
Questions
Does anyone face the same issue?
If so, have you done something to address it?
Many thanks for your feedback