On Grafana Cloud, I have a Stat panel with the following Prometheus Instant query:
sum_over_time(sum(increase(processed_task_seconds_count{environment=~"$environment", instance=~"$instance"}[$__rate_interval]))[$__range:$__rate_interval])
The Stat panel then takes the Last*
value to display the number of invocations of a processed_task
method over the selected range.
This works fine up to 32 day range. But I needed to display the value for the previous fiscal quarter in which case Grafana throws an error and shows No data
.
The error is the following:
execution: the query time range exceeds the limit (query length: 768h16m3.934s, limit: 768h0m0s) (err-mimir-max-query-length). To adjust the related per-tenant limit, configure -querier.max-partial-query-length, or contact your service administrator.
I realized that the problem is passing a too long $__range
value to the sum_over_time
function. So, I tried an alternative approach, instead of using an Instant query, I used a Range query and configured the Stat panel to show me the Total
value instead of Last*
. In theory, this should have worked, but the returned value is completely wrong.
Here is my alternative Range query:
sum(increase(processed_task_seconds_count{environment=~"$environment", instance=~"$instance"}[$__rate_interval]))
This does not give me the err-mimir-max-query-length
error, but the Total calculated from this by the Stat panel is far from reality.
When comparing the the two over a shorter interval of 1 day, where both work, here are the results:
- Instant
sum_over_time
= 482K - Range with Total calculated by the panel = 1.93M
For short ranges, the Instant sum_over_time
works well. But once in a while I need to look beyond 32 days and it seems I am left with either a broken panel or a panel that shows totally unrelated numbers, which is arguably even worse than the broken panel.
I am fairly new to Prometheus, any advice is appreciated.