What’s the group interval on your alert rule? Can you share the config of your alert (query and expressions, group interval and for settings and also the settings or no data and error - screen below):
Prometheus mentions in docs that it’s a distributed system and latency is unvoidable. Therefore, if your alert is too restrictive the data might not be available when you check on them. The easy workaround would be to set the Alert state if no data or all values are null to Keep Last State / Normal / Alerting (whatever suits your case the best) instead of default No Data (which fires an alert and a notification the moment the first No Data appears),
Well, I changed the alert “For” value/parameter from 1 minute to 5 minutes hoping that the alert will trigger only if the problem lasts for 5m. Hopefully this will do the trick.
What do you think?
If not, I’ll post all the configuration here for you.
If the setting Alert state if no data or all values are null is still set to No Data it doesn’t really matter how the for setting is set - the alert will still trigger (you’d notice that in message there’s No Data somewhere in the title)
rule uses instant_query. The Prometheus returns the most recent point in the interval now-lookback_period. It is configured in Prometheus, see --query.lookback-delta, which is 5 minutes by default.
If 5 seconds scape is true, then there must be enough points in that interval. Try “Explore” the metric and see if points are returned.
To mitigate the problem until you figure out what happens, try to use “Keep Last State”. I do not remember when it was re-introduced but perhaps you need to upgrade to use it. Otherwise, you can map NoData to Normal
Run the query in Range mode in the range from X - 1m to X where X is the timestamp when alert instance NoData was created. The timestamp can be taken from logs or from the Rule history.