Reposting here since I didn’t get any answers on Stack.
I’ve set up Grafana by deploying the official helm chart with ArgoCD. I have 3 grafana pods running. In order to achieve HA and to avoid having duplicate notifications, I set up the unified_alerting part in grafana.ini like so :
unified_alerting:
enabled: true
ha_peers: "grafana-headless.grafana.svc.cluster.local:9094"
ha_peer_timeout: "30s"
ha_listen_address: "${POD_IP}:9094"
ha_advertise_address: "${POD_IP}:9094"
ha_reconnect_timeout: 2m
I see no errors in the logs and the alertmanager metrics’ shows the following :
alertmanager_cluster_members 3
alertmanager_cluster_failed_peers 0
alertmanager_cluster_health_score 0
So I can tell that the configuration works. However, when an alert is firing, I receive 2 notifications and I can’t figure out why. This happens with multiple notification policies (happens with email notifications and Teams notifications). Sometimes, I receive the duplicate a few seconds after the first one (in fact the time set for ha_peer_timeout) and sometimes I have no duplicate. I also find it weird to have 2 notifications, I would assume that a problem with HA would give me 3 notifications (one for each pod).
I’d like to be able to receive only one notification because it floods my notifications channels since I have a lot of alerts.