High consumption of VCPU and RAM

Hello, the grafana server has very high CPU and memory consumption, at times the grafana service stops, currently this server is configured with 8vcpu, 32gb ram and 100gb SSD, is there any performance adjustment that can be made in the grafana application or in the zabbix plugin?

The high consumption of vcpu and ram occurs when the zabbix plugin starts synchronization.

Hi @fernandojrdasilva, welcome to the community,

I don’t think It’s a Grafana issue in the core of it. As much as you’ve show about your case, I’d wonder if the queries that you’re doing to your zabbix are quite big, and maybe you’re using some Plug-in built in functions against it.

I’d advise you to to segment the data on the queries, try to migrate some of the plug-in functions to Grafana’s transformations. I’ll make better use of the performance.

Another option if available to you would be getting the Grafana Enterprise and using some query cache to help the performance.

Hope it helps.

Hi guys, new to the community, using Grafana for about 1y.
Since I have similar behavior for some time as OP & I didn’t find anything recent in the forum, I’m adding my issue.
I’m using Grafana with Home Assistant and in the beginning I had some lockups in the morning; DB maintenance?

For a few months now, this container (LXC on Proxmox) has been running without a problem but for about 2-3 weeks I see very high CPU and RAM usage; it’s almost like a heartbeat graph.
Sometimes the load is this ‘hearbeat behavior’ but the majority the machine is running at almost full load.

Just now I installed 10.4.2, hoping it would be better but to no avail: shell or SSH access is almost not possible due to this high load.
Using Proxmox 8.1.10 (Bookworm) but Buster for the container.

Before this high load I assigned 2 vCPU’s and 1GiB of RAM which worked fine.
Chaning it to 4 vCPU’s and 2GiB didn’t make things any better.
Disabling the integration in Home Assistant also doesn’t help.

Any idea what might be wrong?

Sorry if my post is in the wrong place.
If so, can a mod move it please?

Don’t guess, without any evidence:

  • enable Opentelemetry tracing in Grafana and inspect generated traces
  • increase log level to debug and inspect logs

BTW: you have container, so set mem/cpu limit on container level, so you will be to use ssh other apps without interuption.

Hi guys, I added 10GB of swap memory, and so far the server hasn’t crashed, the dashboards are a little slower, but this time the server isn’t crashing. I’ll keep watching.

But does that really address the root cause (which seems to really be unknown at this point?) or just pushes the issue under the rug?

Are you using sqllite as grafana’s backend?

Do you have alerts enabled if so how many?

What is zabbix synchronization you speak of? Can you avoid using that plugin and go for another approach of fetching zabbix data? Rest api, direct zabbix data access?

I’m closing this topic since the answers does open 2 new questions. If someone want to ask, please create a new topic and put a link pointing here for context.

1 Like