Grafana Alerts getting sent multiple times

grillpfanne · July 21, 2022, 2:48pm

We use Grafana to manage our logs, we use alerts to be notified for every ERROR log.

We use this expression:
sum by (line) (count_over_time({swarm_service="api"} |= "ERROR" | pattern "<line>" [2m])) > 0

with “Alert evaluation behavior” of 1min.
The “Rule group evaluation interval” is 1 min.
This leads to three alerts being sent for a single error line. We tried tweaking the alert duration or count_over_time duration, but this didn’t lead to changes.

We are also not able to assess the impact of changing duration values, it seems to have no measurable impact, e.g. changing “Alert evaluation behavior” to a shorter or longer value. We also couldn’t find any documentation that is helping here.

We use Grafana Cloud and this used to work, but updates seem to have broken it.

We believe this is a bug in Grafana Cloud, but greatly appreciate help to try to debug this.

Thanks.

atze2341 · May 17, 2023, 3:57pm

Hi, im having the same issue with Amazon managed grafana. Every Alert that i created is firing 3 times, sending 3 messages.
Did you manage to fix this issue?

grillpfanne · May 17, 2023, 5:13pm

No, sadly not. I tried to find some pattern or mistake, but wasn’t able to.

Still hopeful some community member has an idea

georgerobinson · May 17, 2023, 5:16pm

Do you see three alerts in Grafana UI, or just three notifications?

atze2341 · May 17, 2023, 5:39pm

Its a single alert with three evaluations and alerts sent out. Ive crated an issue with screenshots over here: Alerting: Alerts evaluated 3 times and sending 3 notifications · Issue #68652 · grafana/grafana · GitHub

grillpfanne · May 17, 2023, 6:13pm

I also see one alert in Grafana UI, but three notifications.

My most reliable way of reproduction is to have 2 instances of an alert overlap in time, so alert instance 1 is still firing while alert instance 2 moves from pending to firing.

This is an example of two logging lines, each triggered one alert instance. One was printed at 2023-05-17 17:59:33,066, the other at 2023-05-17 17:59:53,719 (20 sec apart)

These are the notifications I received (both over email and slack integration):

[FIRING:1]  (API Error 2023-05-17 17:59:33,066 - ERROR - Trigger test error)
[FIRING:1]  (API Error 2023-05-17 17:59:33,066 - ERROR - Trigger test error)
[FIRING:1]  (API Error 2023-05-17 17:59:33,066 - ERROR - Trigger test error)
[FIRING:1]  (API Error 2023-05-17 17:59:33,066 - ERROR - Trigger test error)
[FIRING:1]  (API Error 2023-05-17 17:59:53,719 - ERROR - Trigger test error)
[FIRING:1]  (API Error 2023-05-17 17:59:33,066 - ERROR - Trigger test error)
[FIRING:1]  (API Error 2023-05-17 17:59:53,719 - ERROR - Trigger test error)
[FIRING:1]  (API Error 2023-05-17 17:59:53,719 - ERROR - Trigger test error)

One was received 5 times, the other 3 times.

I also had this happened when just one instance was triggering and e.g. sending 2 notifications.

georgerobinson · May 17, 2023, 6:18pm

Thanks! Looking at the screenshots, is it possible you are using $values in a custom label? That would explain the behaviour you are seeing!

You should also avoid using the value of the query in labels because it’s likely that every evaluation of the alert will return a different value, causing Grafana to create tens or even hundreds of alerts when you really only want one.

grillpfanne · May 17, 2023, 6:49pm

I don’t think I am using it in labels. We have it in annotations though. This is the Alert rule’s yaml:

alert: API Error
for: 59s
annotations:
  description: '{{ $labels.line }}'
  summary: Error in API procedure
  '': ''
labels:
  '': ''
expr: >-
  sum by (line) (count_over_time({swarm_service="main_api_main_api"} |= "ERROR"
  | pattern "<line>" [2m])) > 0

Thanks for the documentation link and help in general!

georgerobinson · May 17, 2023, 7:50pm

Are you running Grafana in HA mode? If so, alerts will be evaluated once per replica (which would explain seeing each alert 3 times in the screenshot), but just one notification should be sent. If 3 notifications are being sent for the same alert then I think Grafana has been misconfigured. I see you are using Amazon Managed Grafana, and I don’t know if they use HA or not.

atze2341 · May 17, 2023, 8:12pm

Yeah thank you, i think you may be right. pretty sure they are using some kind of HA. Hope they can fix it over there

github.com/aws/amazon-managed-grafana-roadmap

Notification deduplication for Unified Alerting

opened 05:19PM - 11 May 23 UTC

justinbwood

Per the AWS Managed Grafana [docs on migrating classic alerts to Grafana alertin…g](https://docs.aws.amazon.com/grafana/latest/userguide/v9-alerting-use-grafana-alerts.html), multiple notifications are sent when using Grafana-managed alerts. I would like to see Grafana's [high availability alerting](https://grafana.com/docs/grafana/latest/alerting/set-up/configure-high-availability/#enable-alerting-high-availability) enabled so that notifications are properly deduplicated, as it's a bit frustrating to receive Slack notifications in triplicate when using Unified Alerting. Thanks!

grillpfanne · May 17, 2023, 8:12pm

We’re using Grafana Cloud (different than atze2341) - not sure if I have access to this information.

georgerobinson · May 18, 2023, 1:44pm

Hi @grillpfanne!

Can you share a screenshot of the firing alerts in Grafana UI, and also the notifications? I would like to see the labels to understand if these are different alerts (from the same rule) or duplicated notifications for the same alert.

Thanks!

grillpfanne · May 19, 2023, 8:49am

I fired two test errors and received two notifications for every error.

Errors in Grafana UI:

First mail received with the alerts (I included the received time in the top right, 10:36)

Second mail received with the same errors again, 2 minutes later:

Based on what you wrote earlier, I see the “line=…” label now, but I posted the complete yaml above, so I don’t know where this comes from.

Thank you!

georgerobinson · May 19, 2023, 9:56am

Hi! Thanks for the screenshots! I think I understand where the confusion is here.

I think this is working as intended. You are asking Grafana to create an alert for each ERROR log. If I look at the screenshots Grafana is doing just that. You have two alerts: the first alert is for an error log at time 2023-05-19 08:34:45,879 and the second alert is for a different error log at time 2023-05-19 08:35:28,697.

I think the question is then why does each email contain both alerts? The answer is because that’s how grouping is configured in your Alertmanager configuration. If you want one alert per email you’ll need to disable grouping by changing it to Disable (...).

grillpfanne · May 19, 2023, 11:14am

Hi George,
I’ve disabled grouping, and I still receive multiple notifications per alert:

I again triggered two error logs.

Two instances are firing:

I get 4 emails:

Mail 1: alert 1, sent 12:59:

Mail 2 alert 2, sent 12:59:

Mail 3: alert 1, sent 13:01:

Mail 4: alert 2, sent 13:01:

It’s completely possible that our config is wrong somewhere, but from my current understanding we shouldn’t get multiple notifications.

Thanks!

grillpfanne · May 19, 2023, 11:15am

This is our notification policy:

(could only upload 5 images per post)

georgerobinson · May 19, 2023, 2:38pm

Hi! I understand you are using Grafana Cloud.

Do you know if you are using Grafana Managed Alerts or Mimir alerts? The first screenshot looks like Grafana Managed Alerts to me, but I just wanted to check.
If you are using Grafana Managed Alerts, are you using the Grafana Cloud Alertmanager? The emails look like you are, but again I just wanted to check.
If both 1 and 2 are correct, did you select a preference in “Sends alert to”, and if so which one did you choose? You can find this in the Admin page under Alerting.

grillpfanne · May 19, 2023, 4:24pm

We have them configured under “Mimir / Cortex / Loki”.
We still have one GrafanaCloud alert configured, but this is for a different service and state history shows no state changes for the last 6 months.
We configure everything using the Browser UI. Alerting > Alert rules (domain.grafana.net/alerting/list). We have a loki datasource sending us logs, the alerts are configured on that source.
I don’t think we are using this. We haven’t selected a preference there.

Thanks!

oliverschenk · January 12, 2024, 6:57am

I have the same issue. No special configuration.

AWS Managed Grafana (provisioned Jan 2024).

SiteWise IoT Data source → Grafana Alerts → SNS → Email

Fires 3 times for me as well when threshold is breached and when back to normal.

grillpfanne · January 14, 2024, 11:50am

We end up changing the group wait from 1s to 30s and the group interval from 1s to 5m (the default values). This seems to have fixed it (has been running for a few months now, without duplicate alerts).
I’m not really sure what these configs even do, since we have grouping disabled. Spoke a bit with customer support, but the conclusion was “increase the numbers”, which seems to have helped in our case.

Topic		Replies	Views
Grafana Alert sent 3 alert notifications at once Alerting	34	5127	September 18, 2024
Grafana send 3 notification for just one alert rule Grafana alerting	0	212	February 20, 2024
Multiple alerts notifications for each query in an alert Alerting	5	989	August 28, 2024
Unified Alerting: alert triggers every 4 hours Alerting alerting	6	7023	October 20, 2022
Grafana Alerts firing twich on Teams Alerting	6	560	June 21, 2023

Grafana Alerts getting sent multiple times

Related topics