Very different values for different time ranges

vinci · October 29, 2018, 9:33am

grafana shows very different cpu values on different time ranges. This is the query:

100 - (avg by (cpu) (irate(node_cpu{mode=“idle”, instance=~“$server”}[1m])) * 100)

When I choose a 6-hour time range, the percentage on the same period is much higher than that of a 7-hour time range - probably because of a spike around 7 hours ago.
Please compare the two pictures.

And the 6-hour range:

This looks like a grafana bug. I’m using Ubuntu 16.04.5 with grafana 5.2.2.

adeverteuil · October 29, 2018, 4:22pm

What’s the frequency of datapoints per minute? Does it help if you increase the range vector selector to “[5m]”?

vinci · November 1, 2018, 9:13am

Hi,
Thanks for the answer.
As can be seen in the query, the frequency is 1 minute. I changed it to 5 minutes. It’s exactly the same.
[update]
I changed it to 5 minutes and then back to 1 minute, then changed the time range several times and now it seems to be displaying the correct data. I’m not sure what’s going on. I’ll leave it be for a while and get back to you.
It’s nonetheless rather annoying, because the values are (were) misrepresented.

By the way, the resolution of the graph is 1/10. Not sure if this is the right one, but other resolutions seem to be displaying the data completely disproportionately.

vinci · November 3, 2018, 12:25pm

After playing with it a little bit, it’s still the same. I’ve got the same problem. If the time range is higher than a certain time range (and, essentialy, it includes certain spikes), then it can’t solve the differences, it doesn’t observe the proportions. It’s as simple as that.

vinci · November 27, 2019, 12:19pm

I would like to revive this thread, as I really don’t understand why grafana still behaves like that after one year with very different metrics.
I’ve got the following screenshots:
This one displays the time for the last hour:

Whereas this one for the last three hours:

As you can see, I get extremely different values, which makes absolutely no sense to me. Can anyone explain to me what is going on?

This are the queries I’m using:

irate(node_disk_reads_completed_total{instance=~“$node:$port”,job=~“$job”,device=~“[a-z][a-z]“}[1m])
irate(node_disk_writes_completed_total{instance=~”$node:$port",job=~“$job”,device=~"[a-z][a-z]”}[1m])

The unit I’m using is i/o ops/sec.

I’m using grafana 6.4.4 from the official docker image.

mattes · November 28, 2019, 2:23pm

I had problems like this on our HPC-Cluster in the University when we used Prometheus,
but since we switched to InfluxDB/Telegraf this problem is gone.

Did you check if the query graph in Prometheus itself shows the same behavior?

valyala · September 5, 2021, 11:23pm

The issue is in irate() function. It calculates the rate based on two adjancent samples instead of calculating the rate based on all the samples per each step between points on the graph. When the time range for the graph changes, the set of samples used for irate() calculation also changes. So you see completely different graph. The solution is to use rate() instead of irate(). See the following article for details - Why irate from Prometheus doesn't capture spikes | by Aliaksandr Valialkin | Medium

vinci · September 6, 2021, 5:44am

So is irate basically crap?

Topic		Replies	Views
1/1 resolution, 15s scrape_interval, and empty graphs when time range is 3h or less Prometheus	1	942	November 29, 2023
Grafana sowing wrong data from Prometheus Prometheus	1	595	April 27, 2022
Diferent data each time the graphs are reloaded Prometheus	3	4102	February 15, 2018
Time Range Controls Prometheus	2	2886	June 11, 2019
What should I use for time on irate/rate? Grafana	1	3979	April 4, 2019

Very different values for different time ranges

Related topics