Agent scrape_interval break CPU chart

  1. 10s


  wal_directory: /tmp/agent
    scrape_interval: 10s

grafana agent conifig

100 - (avg by (instance) (rate(node_cpu_seconds_total{mode="idle"}[1m])) * 100)


  1. 60s


  wal_directory: /tmp/agent
    scrape_interval: 60s 

grafana agent conifig

b) 1m
100 - (avg by (instance) (rate(node_cpu_seconds_total{mode="idle"}[1m])) * 100)

no data

c) 5m
100 - (avg by (instance) (rate(node_cpu_seconds_total{mode="idle"}[5m])) * 100)

show chart with value on each 15 sec

Why? How to fix?

scrape_interval: 60s`

What it does? Does it collect data each 60sec or with interval of each 60s? So on each 60s I will have 1 value or more? If one, then why I see on chart value on each 15 sec.? Actually with whatever setting I see it always on each 15sec.

What is happening here?

Why rate?
I read rate do a difference between start and end point. We want to have raw value, not a difference right between time line points? Unless this things mean something different, than I expect.

But first of all why points on chart are always on each 15 sec and why scrape_interval 60s blow up chart?

How I should do it?

I guess this blog post may help you:

1 Like

100 - (avg by (instance) (rate(node_cpu_seconds_total{mode="idle"}[$__rate_interval])) * 100)

no data

How can I read $__rate_interval value?

For example you can use that variable in the panel title, so it you will have visible current value in the UI.

I have found the value in “inspector”.

Let me know if my conclusion is correct:

agent scrape_interval value

  wal_directory: /tmp/agent
    scrape_interval: 60s

should be equal to prometheus scrape_interval

  scrape_interval: 15s

to make things consistent and simple to use.

This make two questions:

  1. why default interval for prometheus is 15s, but for agent 60s?
  2. what is “best practice” value?


not to

  scrape_interval: 15s

but to grafana Interval behaviour / Scrape interval in data sources.

1.) I don’t know :man_shrugging:. Money? = to save storage/infra cost? I believe 15sec is overkill for 99% of people anyway (for non rate metrics).
2.) See linked blog post:

It is recommended you use the same scrape interval throughout your organization

But of course you may have own requirement, e.g. rate metric with 10sec precision and then 15s is not right setup for this requirement.

The final solution is:


    scrape_interval: 60s

grafana prometheus provisioning

  - name: Prometheus
    type: prometheus
      timeInterval: 60s

which make
100 - (avg by (instance) (rate(node_cpu_seconds_total{mode="idle"}[$__rate_interval])) * 100)

work as expected

1 Like