We have a problem with alerting on certain metrics:
This image comes from a chart displaying load (in percentage) for servers. Just before the alert is triggered a couple of new servers are launched, the alert is set to “avg() of query(A, 10m, now) is above 75” with “If no data or all values are null SET STATE TO No Data” and “If execution error or timeout SET STATE TO Alerting”.
As you can see even though we check the average over 10 minutes the alert immediately fires because it is essentially a new data series and there is no 9 minutes of zero-data to bring the average down, essentially you get the average of 1 or 2 points, and starting servers tend to use more cpu before settling.
Is there some way to prevent an alert to trigger when there is not enough data for that series, or some other way to prevent false positives?
I’m not sure if helpful, but just in case: the data source in this case is Prometheus
Apologies if this has been answered before but I lack the proper context to know good terms to search for.