How to troubleshoot alerting

I have a custom status metric scraped by Prometheus via node_exporter text collector with discrete values: -1 (Not applicable), 0 (OK), 1 (Warning), 2 (Error).

I’m trying to alert on only if the metric value is 2 (Error). I have just one of my servers of a hundred that ALWAYS has a value a value of 2. When I setup the alert it did fire but then occasionally it resolves and then fires again.

I confirmed that the value never changed from 2 so don’t know why the alert thinks it resolved.

I’m using Grafana v10.4.0.
For Reduce the function is Last and mode is Drop Non-numeric Value (but I think I’ve tried Strict as well with same symptoms).
For Threshold, I’ve set “is above” to 1 and Customer recovery threshold is not set.

Does anybody have any suggestions for troubleshooting or a way to avoid false resolves?

Thanks! Thomas Kenny

Enable/check alert state history

1 Like

Being new to alerting this is very valuable information. Thank you very much!