Ideal way of handling DatasourceError alerts

sarm333 · August 15, 2023, 2:26pm

For the new unified alerting, what is the ideal way of catching errors that are due to the datasource?

For instance, most (if not all) alerts are set to “alerting” for the “execErrState” field, which means we will get every alert using a datasource, if the datasource is erroring. One way around this is to create and manage an alert for each datasource to catch the error and then set all other alert’s execErrField to “OK”, so we get one alert for a downed datasource as opposed to how ever many alerts use the datasource.

I’ve also seen in the docs that we can set the field to “error” and although the alert would now give us a DatasourceError (according to the docs), but wouldn’t you still get flooded by each individual alert, should you have many alerts setup using the datasource?

Unless of course I’ve misunderstood the above. The main issue is receiving tons of alerts in the instance of the same datasource issue. Thanks in advance

cretin · November 18, 2024, 10:22pm

I also wonder how this normally works.

There are two issues here, on the one hand the spam sent by alerts which use a datasource which errors out or is not available anymore.

On the other hand, only some metrics from that source might not be available because the actual source used by (to give a very common example) prometheus might be very different. So it’s really hard to split this logically/practically.

One of the things that can improve things slightly is setting the alerts’ no data/error handling to ‘alerting’. The advantage is that the alerts don’t get triggered immediately (after 1 minute in my case with discord), but it takes whatever the pending time is for that alarm. So that’s useful.

Unfortunately this doesn’t solve the spam issue where a data source might error out or might be completely unreachable, in which case all alerts bound to that datasource would start going off and spam the destination channel. So I’m not sure how you can reasonably satisfy both these needs.

jangaraj · November 19, 2024, 12:11am

Topic		Replies	Views
Triggering a datasource error from a different alert name Alerting alerting	9	90	June 18, 2025
Intermitent alertname=“DatasourceError" nodata=ok Alerting	0	168	March 12, 2024
Datasource error alert Alerting alerting , datasource , grafana	3	3360	July 31, 2024
Grafana (9.1) Alerts - DatasourceNoData and DatasourceError Alerting alerting	0	2479	September 7, 2023
Silences for DatasourceNoData Alerting	2	1883	April 6, 2024

Ideal way of handling DatasourceError alerts

Related topics