Slack notifications spamming for 1 and only alert 🤔 Grafana Alerting in HA mode?

xakraz · November 21, 2024, 3:41pm

Hi community

I have a very strange issue

Context

In my organization, we are using Grafana alerting (so far, so good )
We have recently setup HA for alerting (I hope I am not misleading with that information ^^)
Our Grafana instances are deployed with the Grafana-Operator on Kubernetes.

Observations

Observation1

A given alert has been in the firing state for more than 3d on 1 Grafana instance
On the other instance, that we can hit sometimes while going through the ALB ingress in front of both of them, we can see that the same alert is reported as “firing” only for 16h

→ 1st question: How come?

Note:

The alert is based on a Prometheus query.
The Prometheus data source for this metric and alert is an ALB endpoint. Behind are 2 Prometheus servers.
We are aware that the data source does no “stickiness” regarding the Prometheus server targeted … We plan to use Thanos queries + Thanos-sidecars to address this issue later (This is not the main debate of this topic, I think )

Observation2

Here are some Grafana internal metrics from our 2 instances

→ Why don’t we have the same amount of alerts in total

Both instances are using the same MariaDB instance as a backend.

Observation3

Here are some AlertManager cluster metrics

→ I have to admit, I don’t know what metric I am supposed to look and I don’t see anything obvious Except the amount of messages sent/received that match the period where the alert was spamming out Slack channel and till we silenced it.

Does any of you have suggestions regarding that issue?
I can provide more info details if required

Thanks for your time and help

Topic		Replies	Views
Control alert in grafana HA Grafana	2	310	May 5, 2022
Grafana Alerting issue in HA mode when Alert gets updated Alerting ha-grafana	4	58	February 7, 2025
Grafana Alert sent 3 alert notifications at once Alerting	34	5199	September 18, 2024
Grafana Alerts getting sent multiple times Alerting alerting	22	10564	February 29, 2024
Grafana alerting ha duplicate alert history Alerting	5	1140	June 7, 2023