-
What Grafana version and what operating system are you using?
- Grafana 12.3.0 (freshly migrated)
- Kubernetes (k3s 1.32), Handrolled YAML based off of Compose
-
What are you trying to achieve?
- We have a LOT of stale NoData alerts that I want to get rid of.
-
How are you trying to achieve it?
- I have been digging around the documentation about provisioned alerts, trying to figure out what exactly the settings and options are and hoping to understand where the problem originates from
-
What happened?
- We are still very much stuck with a lot of stale alerts. Should our DS go down once, those are extremely annoying to get rid of.
-
What did you expect to happen?
- After a while, an alert should automatically stop firing if it’s condition clears and resolve itself. This isn’t exactly happening…
-
Can you copy/paste the configuration(s) that you are having problems with?
- Yes, below together with screenshot.
-
Did you receive any errors in the Grafana UI or in related logs? If so, please tell us exactly what they were.
- Nothing in the logs and no explicit error either.
-
Did you follow any online instructions? If so, what is the URL?
- No instructions; self-taught off of the Grafana documentation.
Here is an illustrative screenshot:
It continiously is stuck on the NoData state, although everything is fine. This is but one of a whole many examples here.
We use OnCall to manage our alert states most of the time but are preparing to migrate to KeepHQ since OnCall is sunsetting soon. Still, this does not change the state of stuck alerts. ![]()
Here is that alert’s configuration:
Alert config
groups:
- orgId: 1
name: QNAP (NAS)
folder: QNAP Systems, Inc.
interval: 1m
rules:
- uid: senst-qnapCpuTempCrit
title: CPU Temperature (Critical)
condition: condition
data:
- refId: main
relativeTimeRange:
from: 3600
to: 0
datasourceUid: senst-qnap
model:
datasource:
type: influxdb
uid: senst-qnap
intervalMs: 1000
maxDataPoints: 43200
query: "from(bucket: \"qnap\")\r\n |> range(start: v.timeRangeStart, stop: v.timeRangeStop)\r\n |> filter(fn: (r) => r[\"_measurement\"] == \"qnap.nas\")\r\n |> filter(fn: (r) => r[\"_field\"] == \"cpuTemperature\")\r\n |> aggregateWindow(every: v.windowPeriod, fn: last, createEmpty: false)\r\n |> yield(name: \"last\")"
refId: main
- refId: alert
datasourceUid: __expr__
model:
conditions:
- evaluator:
params: []
type: gt
operator:
type: and
query:
params:
- A
reducer:
params: []
type: last
type: query
datasource:
type: __expr__
uid: __expr__
expression: main
intervalMs: 1000
maxDataPoints: 43200
reducer: last
refId: alert
type: reduce
- refId: condition
datasourceUid: __expr__
model:
conditions:
- evaluator:
params:
- 90
type: gt
operator:
type: and
query:
params:
- B
reducer:
params: []
type: last
type: query
datasource:
type: __expr__
uid: __expr__
expression: alert
intervalMs: 1000
maxDataPoints: 43200
refId: condition
type: threshold
dashboardUid: n4WBsOJWk
panelId: 19
noDataState: NoData
execErrState: Error
for: 5m
annotations:
__dashboardUid__: n4WBsOJWk
__panelId__: "19"
description: ""
runbook_url: ""
summary: ""
labels:
"": ""
customer: senst
severity: CRIT
isPaused: false
What can I do to get rid of this stale alert - and others of it’s kind?
Our datasource went offline on saturday, and there’s a whole lot of those still sticking around and firing.
Thanks!
