Hello Gilles, thank you very much.
The concept of “alert instance” is new to me. ¿ is an alert instance a unique combination of labels and information that triggers the rule?
My alert rule has been built on purpose to stay in the firing state as long as possible in order to evaluate how the notification policy is triggered (if it is triggered once or many times during the lifetime of the firing alarm). In fact, the rule has been firing without interruption for the last 28 days.
How I built this rule: every time any network device sends a log message to Loki with the “DOWN” string, the alert is called for evaluation. The evaluation timeframe is 5 minutes. If the condition is met more than 10 minutes, then the alert fires. So once it fires, it is kept in the firing state as long as any network device provides another message with the “DOWN” string, and this happens very frequently.
The following is a screen capture of the alert as I see it in Grafana:
I ran the Alert Rule API to get the information associated this rule:
{“id”:2,
“uid”:“d98f9dfd-39ad-42d2-87c2-55572e38dbbc”,
“orgID”:1,
“folderUID”:“tAtrgRhSk”,
“ruleGroup”:“FiveMin”,
“title”:“Down”,
“condition”:“C”,
“data”:[
{“refId”:“A”,
“queryType”:“instant”,
“relativeTimeRange”:{“from”:600,“to”:0},
“datasourceUid”:“dd65c3e1-b7a2-410f-b117-cb3659f101bc”,
“model”:{“datasource”:{“type”:“loki”,“uid”:“dd65c3e1-b7a2-410f-b117-cb3659f101bc”},
“editorMode”:“builder”,
“expr”:“count_over_time({job="syslog"} |= DOWN
[1m])”,
“hide”:false,
“intervalMs”:1000,
“maxDataPoints”:43200,
“queryType”:“instant”,
“refId”:“A”}
},
{"refId":"C",
"queryType":"",
"relativeTimeRange":{"from":600,"to":0},
"datasourceUid":"__expr__",
"model":{"conditions":[{"evaluator":{"params":[2],"type":"gt"},
"operator":{"type":"and"},
"query":{"params":["C"]},
"reducer":{"params":[],"type":"last"},
"type":"query"}],
"datasource":{"type":"__expr__","uid":"__expr__"},
"expression":"A",
"intervalMs":1000,
"maxDataPoints":43200,
"refId":"C",
"type":"threshold"}
}
],
“updated”:“2024-02-07T10:56:04-03:00”,
“noDataState”:“NoData”,
“execErrState”:“Error”,
“for”:“5m”,
“labels”:{“FLAP”:“DOWNVALUE”},
“isPaused”:false}