How can I set the global error timeout for alerts

mamutuberalles · November 25, 2024, 6:40pm

Hi everyone!

I have the latest kube-prometheus-stack chart installed on my cluster, with Grafana alerts configured for some important metrics (CPU usage, and pod status for example).
My issue is that every time I restart the prometheus pod or it just restarts, my alerts go into the DataSourceError state and send a (seemingly) immediate alert notification to my discord channel which looks like there are a multitude of errors even though they are all kind of false positives.
Is there a way to set the global timeout for this Error state to be a larger value? For example 5 minutes, so my prometheus can restart and read its WAL without alerts flooding my channel?
If there is a way to set this value can you point me there?

Thanks in advance!

dawiddebowski · November 25, 2024, 7:57pm

Hi, I don’t know about a value but when I had datasource problems, we decided to set the Alert state if execution error or timeout to Alerting / Keep Last State setting.

Setting Error will fire the alert as soon as the first error was recorded and I’m not sure if that’s something you can configure. However, setting Alerting would be treated like a single breaching of the threshold. I’d recommend setting Keep Last State since it came back

pepecano · December 9, 2024, 6:21pm

Beside changing the NoData/Error state, you could also implement a notification policy to handle the DataSourceError alerts for the alert rule or for all alert rules. Then, configure the notification policy to decide how to group the DataSourceError alerts and when to deliver them.

For more details, see:

@jangaraj’s answer to a similar question.
About notification policies and alert grouping.

Topic		Replies	Views
Lots of DatasourceErrors for Cloud Loki and Prometheus Grafana Cloud alerting	2	975	June 24, 2024
Alerts fire a DatasourceError regularly Grafana Cloud alerting	9	1077	August 30, 2024
DatasourceError with no information on what triggers it Prometheus alerting , datasource	6	76	October 4, 2024
Ideal way of handling DatasourceError alerts Alerting alerting	2	1077	November 19, 2024
Alerting Notification Policy - DatasourceError / DatasourceNoData Grafana Cloud alerting , alert-notifications	5	4721	February 4, 2022

How can I set the global error timeout for alerts

Related topics