Agent scrape_interval break CPU chart

  1. 10s

a)

...
metrics:
  wal_directory: /tmp/agent
  global:
    scrape_interval: 10s
...

grafana agent conifig

b)
100 - (avg by (instance) (rate(node_cpu_seconds_total{mode="idle"}[1m])) * 100)

works

  1. 60s

a)

...
metrics:
  wal_directory: /tmp/agent
  global:
    scrape_interval: 60s 
...

grafana agent conifig

b) 1m
100 - (avg by (instance) (rate(node_cpu_seconds_total{mode="idle"}[1m])) * 100)

no data

c) 5m
100 - (avg by (instance) (rate(node_cpu_seconds_total{mode="idle"}[5m])) * 100)

show chart with value on each 15 sec


Why? How to fix?

scrape_interval: 60s`

What it does? Does it collect data each 60sec or with interval of each 60s? So on each 60s I will have 1 value or more? If one, then why I see on chart value on each 15 sec.? Actually with whatever setting I see it always on each 15sec.

What is happening here?

Why rate?
I read rate do a difference between start and end point. We want to have raw value, not a difference right between time line points? Unless this things mean something different, than I expect.

But first of all why points on chart are always on each 15 sec and why scrape_interval 60s blow up chart?

How I should do it?

I guess this blog post may help you:

1 Like

100 - (avg by (instance) (rate(node_cpu_seconds_total{mode="idle"}[$__rate_interval])) * 100)

no data

How can I read $__rate_interval value?

For example you can use that variable in the panel title, so it you will have visible current value in the UI.

I have found the value in “inspector”.

Let me know if my conclusion is correct:

agent scrape_interval value

metrics:
  wal_directory: /tmp/agent
  global:
    scrape_interval: 60s

should be equal to prometheus scrape_interval

global:
  scrape_interval: 15s

to make things consistent and simple to use.

This make two questions:

  1. why default interval for prometheus is 15s, but for agent 60s?
  2. what is “best practice” value?

edit:

not to

global:
  scrape_interval: 15s

but to grafana Interval behaviour / Scrape interval in data sources.

1.) I don’t know :man_shrugging:. Money? = to save storage/infra cost? I believe 15sec is overkill for 99% of people anyway (for non rate metrics).
2.) See linked blog post:

It is recommended you use the same scrape interval throughout your organization

But of course you may have own requirement, e.g. rate metric with 10sec precision and then 15s is not right setup for this requirement.

The final solution is:

agent

metrics:
  global:
    scrape_interval: 60s

grafana prometheus provisioning

datasources:
  - name: Prometheus
    type: prometheus
    jsonData:
      timeInterval: 60s

which make
100 - (avg by (instance) (rate(node_cpu_seconds_total{mode="idle"}[$__rate_interval])) * 100)

work as expected

1 Like