Grafana alert rule triggers its associated notification policy more than once

Dear team:

One of my alert rules is associated to a notification policy that triggers the execution of a webhook with a POST message to an external system.

I noticed that as far as my alert rule keeps itself in a firing state, its associated notification policy is executed with the same frequency that the rule´s evaluation period was configured.

I did not find a way to limit the aforementioned behavior. I need the notification policy to be executed only once during the “lifetime” of the firing alarm. ¿ Is there any way to do it?

Hints will be greatly appreciated!
Best regards

Rogelio

Hi!

The alert rule will send every alert instance of the rule to the contact point when it goes in to firing mode. Are new alert instances created while the alert rule is running or does the rule have a fixed dimension?

The notification policy will continue to send the firing alerts every “repeat interval”, this is 4 hours by default but can be changed to any arbitrary duration.

The behaviour you are describing where the contact point receives the same alert instance(s) every evaluation period is very odd. Do you have more info on how you’ve set up the alert rule and evaluation group?

2 Likes

Hello Gilles, thank you very much.

The concept of “alert instance” is new to me. ¿ is an alert instance a unique combination of labels and information that triggers the rule?

My alert rule has been built on purpose to stay in the firing state as long as possible in order to evaluate how the notification policy is triggered (if it is triggered once or many times during the lifetime of the firing alarm). In fact, the rule has been firing without interruption for the last 28 days.

How I built this rule: every time any network device sends a log message to Loki with the “DOWN” string, the alert is called for evaluation. The evaluation timeframe is 5 minutes. If the condition is met more than 10 minutes, then the alert fires. So once it fires, it is kept in the firing state as long as any network device provides another message with the “DOWN” string, and this happens very frequently.

The following is a screen capture of the alert as I see it in Grafana:

I ran the Alert Rule API to get the information associated this rule:

{“id”:2,
“uid”:“d98f9dfd-39ad-42d2-87c2-55572e38dbbc”,
“orgID”:1,
“folderUID”:“tAtrgRhSk”,
“ruleGroup”:“FiveMin”,
“title”:“Down”,
“condition”:“C”,
“data”:[
{“refId”:“A”,
“queryType”:“instant”,
“relativeTimeRange”:{“from”:600,“to”:0},
“datasourceUid”:“dd65c3e1-b7a2-410f-b117-cb3659f101bc”,
“model”:{“datasource”:{“type”:“loki”,“uid”:“dd65c3e1-b7a2-410f-b117-cb3659f101bc”},
“editorMode”:“builder”,
“expr”:“count_over_time({job="syslog"} |= DOWN [1m])”,
“hide”:false,
“intervalMs”:1000,
“maxDataPoints”:43200,
“queryType”:“instant”,
“refId”:“A”}
},

	{"refId":"C",
	 "queryType":"",
	 "relativeTimeRange":{"from":600,"to":0},
	 "datasourceUid":"__expr__",
	 "model":{"conditions":[{"evaluator":{"params":[2],"type":"gt"},
					   "operator":{"type":"and"},
					   "query":{"params":["C"]},
					   "reducer":{"params":[],"type":"last"},
					   "type":"query"}],
		     "datasource":{"type":"__expr__","uid":"__expr__"},
			"expression":"A",
			"intervalMs":1000,
			"maxDataPoints":43200,
			"refId":"C",
			"type":"threshold"}
	 }
	],

“updated”:“2024-02-07T10:56:04-03:00”,
“noDataState”:“NoData”,
“execErrState”:“Error”,
“for”:“5m”,
“labels”:{“FLAP”:“DOWNVALUE”},
“isPaused”:false}

I found and understood the concept of alert instance in Alert instances | Grafana documentation

I will refine my alert rule so only one alert instance can trigger the rule and see how the notification policy is invoked.

In parallel: I will also adjust the timers in the notification policy section.

Thanks!!!
Best regards, Rogelio