(Datasource)NoData alerts stick around forever

  • What Grafana version and what operating system are you using?

    • Grafana 12.3.0 (freshly migrated)
    • Kubernetes (k3s 1.32), Handrolled YAML based off of Compose
  • What are you trying to achieve?

    • We have a LOT of stale NoData alerts that I want to get rid of.
  • How are you trying to achieve it?

    • I have been digging around the documentation about provisioned alerts, trying to figure out what exactly the settings and options are and hoping to understand where the problem originates from
  • What happened?

    • We are still very much stuck with a lot of stale alerts. Should our DS go down once, those are extremely annoying to get rid of.
  • What did you expect to happen?

    • After a while, an alert should automatically stop firing if it’s condition clears and resolve itself. This isn’t exactly happening…
  • Can you copy/paste the configuration(s) that you are having problems with?

    • Yes, below together with screenshot.
  • Did you receive any errors in the Grafana UI or in related logs? If so, please tell us exactly what they were.

    • Nothing in the logs and no explicit error either.
  • Did you follow any online instructions? If so, what is the URL?

    • No instructions; self-taught off of the Grafana documentation.

Here is an illustrative screenshot:

It continiously is stuck on the NoData state, although everything is fine. This is but one of a whole many examples here.

We use OnCall to manage our alert states most of the time but are preparing to migrate to KeepHQ since OnCall is sunsetting soon. Still, this does not change the state of stuck alerts. :slight_smile:

Here is that alert’s configuration:

Alert config
groups:
    - orgId: 1
      name: QNAP (NAS)
      folder: QNAP Systems, Inc.
      interval: 1m
      rules:
        - uid: senst-qnapCpuTempCrit
          title: CPU Temperature (Critical)
          condition: condition
          data:
            - refId: main
              relativeTimeRange:
                from: 3600
                to: 0
              datasourceUid: senst-qnap
              model:
                datasource:
                    type: influxdb
                    uid: senst-qnap
                intervalMs: 1000
                maxDataPoints: 43200
                query: "from(bucket: \"qnap\")\r\n  |> range(start: v.timeRangeStart, stop: v.timeRangeStop)\r\n  |> filter(fn: (r) => r[\"_measurement\"] == \"qnap.nas\")\r\n  |> filter(fn: (r) => r[\"_field\"] == \"cpuTemperature\")\r\n  |> aggregateWindow(every: v.windowPeriod, fn: last, createEmpty: false)\r\n  |> yield(name: \"last\")"
                refId: main
            - refId: alert
              datasourceUid: __expr__
              model:
                conditions:
                    - evaluator:
                        params: []
                        type: gt
                      operator:
                        type: and
                      query:
                        params:
                            - A
                      reducer:
                        params: []
                        type: last
                      type: query
                datasource:
                    type: __expr__
                    uid: __expr__
                expression: main
                intervalMs: 1000
                maxDataPoints: 43200
                reducer: last
                refId: alert
                type: reduce
            - refId: condition
              datasourceUid: __expr__
              model:
                conditions:
                    - evaluator:
                        params:
                            - 90
                        type: gt
                      operator:
                        type: and
                      query:
                        params:
                            - B
                      reducer:
                        params: []
                        type: last
                      type: query
                datasource:
                    type: __expr__
                    uid: __expr__
                expression: alert
                intervalMs: 1000
                maxDataPoints: 43200
                refId: condition
                type: threshold
          dashboardUid: n4WBsOJWk
          panelId: 19
          noDataState: NoData
          execErrState: Error
          for: 5m
          annotations:
            __dashboardUid__: n4WBsOJWk
            __panelId__: "19"
            description: ""
            runbook_url: ""
            summary: ""
          labels:
            "": ""
            customer: senst
            severity: CRIT
          isPaused: false

What can I do to get rid of this stale alert - and others of it’s kind?

Our datasource went offline on saturday, and there’s a whole lot of those still sticking around and firing.

Thanks!

Hey @senproingwersenk try noDataState: OK

I think what you see is a normal behavior when query does not return any data, which can happen if your data source query filters data points.