Alert keeps firing even after the issue has been resolved

The app is running on Kubernetes as a StatefulSet and the issue has been resolved by a new instance. Nevertheless, even though the alert’s conditions are, or should be, no longer met, the alert is still firing because its set to "noDataState": "Alerting" and the old instance isn’t providing any data (at least that’s the only explanation I can think of). I’ve attached the panel screenshot and the alert’s config.

Can the alert’s config be improved or is this a genuine bug? I’d like to keep "noDataState": "Alerting" setting.


Zoomed in, to confirm that the metric value is indeed > 0:

(The app has a low load and the graph is squished on the first screenshot as I wanted to capture everything on 1 image.)

modify-export-Infra-eeewt539dnym8f-1744799754862.json (1.6 KB)

Related: Grafana sends the firing status although the alert is resolved

Resolved by changing the ‘Time range’ from 6h to 2h ("relativeTimeRange" in JSON). 6h was the default set by the panel to which this alarm belongs.

This being “solved”, a followup question: to avoid false alarms, what should an alert’s time range be set to (especially in environments, such as Kubernetes, where targets come and go)? Simply make it always equal to its evaluation period (with some additional time buffer)?

Make sure that alert query returns single time serie. Now you have 2 time series (they have different instance labels).

1 Like

Thanks! :folded_hands: Would this be the correct query?

sum (
  rate(docs_indexed{service="app",environment="production"}[5m])
)

I could sum by (instance), but this would still alert for separate pods (there’s pod="app-0", pod="app-1", etc. in the labels).