I’m using Grafana with a Prometheus datasource. One of the metrics I’m tracking is the rate of RPC errors in my services. Normally the rate is zero, but sometimes I’ll get brief spikes of dozens or hundreds of RPC errors in a minute.
However, I’m finding that the visibility of these spikes changes depending on the zoom level. If I’m zoomed to a day, I might see a spike, but zooming out to three days or seven days might make the spike disappear entirely, resulting in a line reading zero for the whole time period. The zeroed line is misleading, however, since there actually were errors during that time period.
Are there any recommendations or best practices for making sure that short, sharp spikes are visible at all zoom levels? Below is an example query I’ve been using:
sum by (grpc_service, grpc_method, grpc_code) (irate(grpc_server_handled_total{grpc_code="DeadlineExceeded"}[1m]))
Using $__interval does not prevent spikes from disappearing. This is well visualized in the following issue.
Either the rate function has to include data points outside the given range or Grafana has to provide a value that leads to a slight oversampling. Björn initiated an issue that will solve this https://github.com/grafana/grafana/issues/21417
Just added this comment in case other people that are not that involved stumble upon it
Hi got same issue. My metric usually is 0 but 1-10 times a week it got some spikes. I have Grafana 8.5.21 and Viktoria Metrics as datasource.
My solution was to change Max Data Points parameter in Data Source Query Options.
I had changing it until Interval became 10s , like my scraping period. Since that moment I could see each spike I got.