Grafana Alertmanager cluster size

gedohub · May 12, 2025, 6:27am

What Grafana version and what operating system are you using?
I have tested with Grafana 11.4.1 and 11.6.1
What are you trying to achieve?
I’m trying to run multiple Grafana setups on one Kubernetes cluster. Each setup has two replicas.
How are you trying to achieve it?
I’m using HELM chart. My HA setup:

...
  extraExposePorts:
    - name: "grafana-alert"
      port: 9094
      targetPort: 9094

  grafana.ini:
...
    unified_alerting:
      enabled: true
      ha_listen_address: "${grafana_pod_ip}:9094"
      ha_advertise_address: "${grafana_pod_ip}:9094"
      ha_peers: "$CLUSTER_NAME:9094"
...

  serviceMonitor:
    enabled: true

...

Each grafana setup is in different namespace.

What happened?
Alertmanager is connecting to ALL alertmanagers on Kubernetes cluster. If I create 20 or 30 Grafana setups, they all connect to each other. ( 20 setups * 2 replicas = 40 alertmanager peers ) WHY? How to prevent this?
Seams because of this mesh clustering alerts are delayed to deliver.
What did you expect to happen?
Each grafana setup connect ONLY to it’s grafana alertmanagers.
Can you copy/paste the configuration(s) that you are having problems with?
Not shure which part.
Did you receive any errors in the Grafana UI or in related logs? If so, please tell us exactly what they were.
If I delete deployment and re-create it it has only replica in alertmanager peers, but in few minutes to alert manager all grafana setups are introduced. Log example:

logger=ngalert.multiorg.alertmanager component=clustering t=2025-05-08T09:08:34.884518081Z level=debug memberlist="2025/05/08 09:08:34 [DEBUG] memberlist: Stream connection from=10.15.99.117:41146\n"
logger=ngalert.multiorg.alertmanager component=clustering t=2025-05-08T09:08:34.885068991Z level=debug received=NotifyJoin node=01JTJB58VHEEKEHK1AZCC9C7R5 addr=10.15.118.205:9094
logger=ngalert.multiorg.alertmanager component=clustering t=2025-05-08T09:08:34.885105163Z level=debug received=NotifyJoin node=01JTJRRZ26HXEJCW0S28PMK34G addr=10.15.184.150:9094
logger=ngalert.multiorg.alertmanager component=clustering t=2025-05-08T09:08:34.885126984Z level=debug received=NotifyJoin node=01JTJPQQMBXGB4AMP3449S5ZF7 addr=10.15.106.212:9094
logger=ngalert.multiorg.alertmanager component=clustering t=2025-05-08T09:08:34.885145705Z level=debug received=NotifyJoin node=01JTJB43D13BVHKJTDM6A16S2J addr=10.15.31.177:9094
logger=ngalert.multiorg.alertmanager component=clustering t=2025-05-08T09:08:34.885163786Z level=debug received=NotifyJoin node=01JTJWW3VM6N1TQ6YYBCBMASGV addr=10.15.129.87:9094
...

Did you follow any online instructions? If so, what is the URL?
- helm-charts/charts/grafana/README.md at main · grafana/helm-charts · GitHub
- Configure high availability | Grafana documentation

I’m confused how Grafana Alertmanager is discovering peers. I have done some experiments disabling or enabling things, but didn’t understood mechanics behind it. I’m seeking for good explanation how that discovery works and how to prevent it

yuriy.tseretyan · May 16, 2025, 9:09pm

Is CLUSTER_NAME the same for all namespaces? That could explain why they all join same cluster. ha_peers should point to a headless service scoped to the namespace.

Topic		Replies	Views
Enable HA for grafana alerting in kubernetes Alerting alerting , ha-grafana , kubernetes	0	879	June 12, 2023
Grafana producing multiple alerts in scaled Kubernetes installation Alerting kubernetes	2	1312	August 19, 2022
Alerts/notifications are not deduplicated when using HA unified alerting Configuration ha-grafana , unified-alerting	3	2824	April 14, 2023
Deploy Grafana 8.5.6 with ha on 2 OpenShift clusters Installation alerting	1	325	June 19, 2023
Slack notifications spamming for 1 and only alert 🤔 Grafana Alerting in HA mode? Alerting ha-grafana , slack	0	38	November 21, 2024

Grafana Alertmanager cluster size

Related topics