I am using Grafana in a Proof of Concept project. I feed the network traffic summary and the latency data of about 500 networks and their 2 ISP connections to InnoDB. I created one dashboard for each network in Grafana, using the provisioning directory. Both InnoDB and Grafana are running as a container. A small python script is feeding the data to InnoDB, and another one configuring the Grafana. In total I have about 500 dashboards. My goal would be to embed these Dashboards into a web application. So Grafana would be used only on demand, when it would need to render a graph (or a dashboard). Now my problem is, that even when noone is accessing Grafana, there is quite substantial load on the server. The uptime command shows numbers like 300, or even more. The server is still quite responsive, so the overall result is still not very bad. Looking with htop I can see that the load jumps up and down, for some seconds all cores are idling, then goes up to 100%, then again 0%. Grafana has hundreds of threads running. My initial thought was that if I switch off alerting, Grafana should stop checking and I could go to a near zero load. Unfortunately, that did not make a trick. I tried to search, but I could not find anything relevant. Could anyone please point me to the right direction? Is this even possible with Grafana?
Why it is bad? I guess it is load - it just saying that 300 processes is waiting for something. It can be storage (slow storage, low iops performance), network (slow network), ā¦
It doesnāt indicates that Grafana is doing something, it can be that InnoDB is just writing (processing, compacting, indexing, ā¦) something. You need to find which processes are waiting and why. Generally high load doesnāt mean any overloading.
Well, it is grafana, for a fact, as I checked that. Otherwise I would not be posting here⦠My point is that grafana should not be doing anything. Normally only the InnoDB part should be working, as there is data fetched from the source and pushed to InnoDB. The Grafana part should come into play when I want to see the graphs. This is what I am not able to achieve, as Grafana is doing something, which I do not know what is. Maybe this is how it works and it can not be changed?
Stop Grafana and show current uptime 1-min. If it continues, then provide full reproducible Grafana setup (not just I have some Grafana which have high load) and iostat, vmstat, netstat outputs.
you would be surprised how many people post here issues with performance and it ends up being something other than grafana: badly designed tables being queried, badly designed influxdb fields,tags etc
This is how I started grafana:
docker run
-d
āname grafana
-p 3000:3000
-e GF_SECURITY_ALLOW_EMBEDDING=true
-e GF_ALERTING_ENABLED=false
āvolume /opt/grafana:/var/lib/grafana
āvolume /opt/grafana_provisioning:/etc/grafana/provisioning/
ārestart always docker.io/grafana/grafana-oss
Data source:
/opt/grafana_provisioning/datasources/influxdb.yaml
OK, so InnoDB is InfluxDB actually. We still donāt know Grafana version (I will āloveā you when you will say that itās ālatestā, because ādocker.io/grafana/grafana-ossā is latest image of course).
iostat, vmstat, netstat, docker stats outputs? You provided nothing just single number (load), which can means nothing. Enable tracing for your Grafana and check traces - you will see what Grafana is doing.
procs -----------memory---------- ---swap-- -----io---- -system-- ------cpu-----
r b swpd free buff cache si so bi bo in cs us sy id wa st
117 0 67840 315304 255932 2025040 0 0 2 485 29 5 32 23 45 0 0
docker stat
ID NAME CPU % MEM USAGE / LIMIT MEM % NET IO BLOCK IO PIDS CPU TIME AVG CPU %
704686452b57 grafana 600.90% 659.8MB / 4.101GB 16.09% 355.9kB / 222.5kB 11.85MB / 87.18MB 848 41m40.421414s 300.45%
bd23b0bc3e75 influxdb 0.67% 425.1MB / 4.101GB 10.37% 8.444GB / 20.47GB 598.4MB / 141.1GB 23 12h55m11.388838s 0.33%
docker ps
Emulate Docker CLI using podman. Create /etc/containers/nodocker to quiet msg.
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
bd23b0bc3e75 docker.io/library/influxdb:latest influxd 4 months ago Up 2 days ago 0.0.0.0:8086->8086/tcp influxdb
704686452b57 docker.io/grafana/grafana-oss:latest 23 minutes ago Up 12 minutes ago 0.0.0.0:3000->3000/tcp grafana
netstat
Active Internet connections (w/o servers)
Proto Recv-Q Send-Q Local Address Foreign Address State
tcp 0 0 localhost:52346 localhost:8086 TIME_WAIT
tcp 0 0 docker:32914 158.115.147.239:https TIME_WAIT
tcp 0 0 localhost:52342 localhost:8086 ESTABLISHED
tcp 0 0 localhost:57442 localhost:8086 TIME_WAIT
tcp 0 0 docker:ssh 10.10.0.61:57550 ESTABLISHED
tcp 0 0 docker:40942 158.115.147.239:https TIME_WAIT
tcp 0 0 docker:43662 158.115.147.212:https TIME_WAIT
tcp 0 0 docker:40928 158.115.147.239:https ESTABLISHED
tcp 0 244 docker:ssh 10.10.0.61:60724 ESTABLISHED
tcp 0 0 localhost:46840 localhost:5000 TIME_WAIT
tcp 0 0 localhost:57446 localhost:8086 TIME_WAIT
Active UNIX domain sockets (w/o servers)
[cut for saving some space]
I am still struggling with the traces. Enabled the trace, have the file, have go installed, and this is the result:
go tool trace /opt/grafana/trace.out
2024/01/31 18:20:17 Parsing traceā¦
failed to parse trace: no EvFrequency event
Please format your output (donāt torture us please) next time. All those stats command has some parameters, but still better than nothing. You have problem with IOPs. Use standard debug approach (what you should use from the start): increase logs level and watch Grafana server logs. Use App tracing, not golang tracing: Configure Grafana | Grafana documentation
Blind guess: you mixed dashboards (dashboard uids, versions, names) so now provisioning is confused and overwriting dashboards in DB over and over and over ā¦
Also wonder if these settings are also affecting things. set log_queries for a small time period to see what is happening, not sure if set to true by deault.
Found what was the issue. Raised the log level with specifying -e GF_LOG_LEVEL=debug for the container. The log was showing: msg=āStart walking diskā messages. Apparently, by default the provisioned dashboards are checked quite often for changes. I added āupdateIntervalSeconds: 600ā to the dashboard yaml files and now the load is very nicely below one. I could even raise this number much higher as when I change the yaml files, I can notify Grafana about the change. Thanks for you all guys!
Hello,
We are experiencing a performance problem with the new version of grafana (V11.0.0), similar to the problem described above.
The server supporting our mysql database is heavily used by the grafana.exe service (90Mb/s). When we were on a much older version of grafana (V8), we didnāt have this consumption.
I have the impression that this is due to the fact that grafana loads all the requests at once, because when I open my home dashboard, all my panels load very quickly (or are already loaded). Unlike the old version of grafana, where each panel would load as you scrolled through the dashboard.
Do you know how to remedy this consumption problem? Is it possible to set grafana so that it doesnāt load all requests at once?
Thanks