DatasourceError, DatasourceNoData custom routing

My pain
In Grafana, I have one alert configured that can go into firing, No Data, and Error states. I need to have:

  1. firing is sent to a common telegram channel (important for decision making).
  2. No Data and Error to be sent to another channel only for me (required for technical analysis).

Now all states go to the same channel since one label and contact point is used.

My attempt
I tried a variant using templates to distribute alerts. I created two templates: one that allowed only firing state, and one that allowed only No Data and Error. This partially worked, but there was a problem. Because of the No Data and Error states in one template and the firing state in the other, I get the following error: Telegram webhook response status 400 Bad Request.

Here is an example template:

{{ define "tgbody_crit" -}}{{ range . }}
{{ if eq .Labels.alertname "DatasourceNoData" }}

{{ else if eq .Labels.alertname "DatasourceError" }}

{{ else }}<b>{{ .Labels.alertname }}</b>
{{ with .ValueString }}{{ reReplaceAll "[[][^]]*metric='{?([^}']*)}?'[^]]*labels={([^}]*)}[^]]*value=(-?[0-9]*[.]?[0-9]+([eE][+-]?[0-9]+)?)[^]]*](, )?" "$2\n$1: <b>$3</b>\n\n" . }}{{ end }}
{{ range .Annotations.SortedPairs }}{{ .Name }}: {{ .Value }}{{ end }}
{{ with .GeneratorURL }}✏️<a href="{{ . }}">Edit</a> × {{ end }}{{ with .PanelURL }}📉<a href="{{ . }}">View</a> × {{ end }}{{ with .SilenceURL }}🔕<a href="{{ . }}">Mute</a>{{ end }}
{{ end }}
{{ end }}{{ end }}

Please help me to configure routing by alert states to distribute them to different channels

1 Like

DatasourceError, DatasourceNoData error alerts generate special alertnames, which you can use in your notification policy (order is important, of course Continue matching subsequent sibling nodes must be also disabled). My example - I’m dropping all these noisy alerts (but I use alert state history to monitor these problems, so I care about health of my alerts)

3 Likes

Thank you very much for the tip!
I’ve already run several tests, and it works!

So the label is literally “alertname” and its value is “DataSourceError” and “DatasourceNoData” respectively?

Yes, it’s that literally.

And then all the other alarms are set to “Normal” in your case for datasource no data and error?

What is “Normal”? Set error, no data:

It’s here in the documentation:

The definition is not great:

Sets alert instance state to Normal.

Maybe it just means that it’s not doing anything. So, in contrast to “Keep Last State”, it won’t be firing, in case it had already started when a problem with the data source occurs. I’m not sure.

I have another question though. It seems to me that your solution doesn’t actually solve the spam issue if you still want to receive alerts regarding datasource issues. So those matches wouldn’t limit the number of alerts, it’s not as if you’re getting just one alert for one datasource issue, right?

And in your case you’re going to see the spammy information somewhere else anyway (in the alert history). It’s not necessarily a bad solution, but I would really want to know about datasource issues in a channel that I normally follow and I wouldn’t want to check additional things on a regular basis. And, of course, I don’t want the channel to be spammed :slight_smile:

My case: I have manage instance where I as an admin define datasource and users defines own alerts.
What’s a point to send DatasourceError, DatasourceNoData to users? Nonsense, because they don’t have idea about infrastructure behind then. So I hijack all these DatasourceError, DatasourceNoData alerts away from the users - they are not angry about noise and they receive only genuine alerts from their perspective.
But I as an Grafana Admin wants to know about DatasourceError, DatasourceNoData. But I don’t want to receive millions notifications about it. So I sent all these alert notifications to “/dev/null” - I completely ignore them. (Feel free to forward them to own channel if you want. It depends on the number of alert, but I bet you will have notifications-blindness at some point.)
I have own alerts on alerts metrics. It’s normal to have occasional errors/nodata issue (because network glitches, …), so I’m getting alerts, when it is serious (e.g. it lasts for 10+ minutes).

1 Like