We got alert rules creating alert instances that shall not be grouped on Azure Managed Grafana (v9.5.13), not repeated (hence high interval).
Notification policy:
{
"receiver": "Kafka",
"group_by": [
"..."
],
"repeat_interval": "999h",
"group_wait": "0s",
"group_interval": "5m"
}
Expectation: each alert instance triggers an alert notification whereas the starts_at
timestamp in the “resolved” alert matches the one in the initial “firing” alert.
Reality: mismatching timestamps for “resolved” alerts. Some “firing” alerts are never “resolved”. When an alert instance is still active, there should be no duplicate alert triggered.
Sample: (starts_at - resolved_at - status)
- A triggerd and resolved with same timestamp in B.
- C triggered and no matching timestamp, as well as for resolved D.
- Assumption: D belongs to C, why is there a different starts_at timestamp then?
- E+F behave normal again.
F: 2023-12-11 12:40:00 +0000 UTC - 2023-12-11 12:50:00 +0000 UTC - resolved
E: 2023-12-11 12:40:00 +0000 UTC - 0001-01-01 00:00:00 +0000 UTC - firing
D: 2023-12-11 12:14:00 +0000 UTC - 2023-12-11 12:19:00 +0000 UTC - resolved
C: 2023-12-11 12:11:00 +0000 UTC - 0001-01-01 00:00:00 +0000 UTC - firing
B: 2023-12-11 11:52:00 +0000 UTC - 2023-12-11 11:55:00 +0000 UTC - resolved
A: 2023-12-11 11:52:00 +0000 UTC - 0001-01-01 00:00:00 +0000 UTC - firing
As those alert instances got all the same labels and therefore fingerprint, I could also not identify them as unique with the ID. When checking the state history in the UI, there are also sometimes missing alerts which were sent via notification policy.
Anything else still wrong in the policy JSON or any other trick that I’m missing?