I want to use max_over_time to find spikes in a gauge metric that measures cpu usage.
However, when I execute the promQL query in my dashboard, some data is missing.
I also have a graph displaying these spikes, but the main problem with this graph is, that it smoothes away the spikes I am looking for when zooming out into a bigger timeframe.
I do not want to go through multiple hour long time frames just to find the spikes manually, so I used the following query to display spikes in a table:
sum by (cpu)(max_over_time(system_cpu_usage[10m:15s])) >$Threshold
$Threshhold is my variable to configure the percentage of cpu usage that is considered a spike/above normal.
My instance of Prometheus Mimir takes values every 15s, so I am collecting every value and calculating the max over 10 minutes.
Still, values go missing for some reason if the cpu usage isnt high enough to pass the threshhold for multiple minutes.
I tried messing with the minimum step, but that just made the problem even worse the higher I set it.