Grafana out of memory issue

Hi team, I’m new to Grafana. I’m trying to create alert rules on Grafana with TimescaleDB data source, My alert is a basic one which takes last 12 hours data from TimescaleDB hypertable and group on one column to find the sum (find attachment) and if it goes beyond 4 raise an alert.
But when I create this rule and data flows into the source table grafana starts consuming memory without any limit and garbage collection and goes beyond 16GB and fails at some point and restart the pod. The error is Out of memory error.
Please let me know what am I doing wrong.



1 Like

Welcome @krishnaprasadas1

how much ram do you have on the server running grafana. what happens if you ran the same query outside of grafana and how much mem does it consume when running it outside grafana

It has ~32GB ram I guess (I don’t have the exact number with me now since its handled by another team) but I can see the usage goes beyond 20GB from the attached image, The data size in the timescaleDB is a total of ~400MB (2 million rows). I just tried the query in psql but its using minimal memory only less than a GB. What Im confused is why the memory goes on increasing and not coming down at any point of time until it throws OOM error. I tried the alert rule with reduce expression and without reduce but no luck.
One more thing is that, the same sized data is being used in another grafana with InfluxDB data source there it runs perfectly without much memory footprint(~200MB only). PFA alert rule created for influxDB.

1 Like

ok can you now please try the same query in your basic bar chart and not within an alert and see if it pegs the mem?

trying to eliminate the issue being with alerts + TimescaleDB

Tried the same query in dashboard but its just using around 350mb of memory.

1 Like

But did you also somehow implement for it to run every minute for 1 hour snd do a reduce transformation for the last sum > 4?

1 Like

What are your Grafana pod requests/limits memory limits?

I believe Grafana pod is able to see 32GB of memory, so there is no pressure for Golang to run garbage collection. But meanwhile you will reach some K8S mem limit and pod is terminated because OOM.

You may try to set env variable GOGC=10 for Grafana pod to modify default GOGC=100 behaviour. Or better is to modify GOMEMLIMIT env variable, so Golang app (Grafana in this case) will be aware how much memory has available. You can use K8S Downward API.

2 Likes

Set the GOMEMLIMIT to 3GB and pod limit to 4GB, now there is no restart happening, thank you. But it still uses the maximum memory available and does the GC in regular interval, Im still not sure why grafana uses these much memory for a simple alert rule. Is there any optimizations to be done on my alert configuration? Why Im asking is the same alert is running with 350mb of memory on InfluxDB(with same data), and once I changed it to run on timescaleDB the memory use is too high. I’m expecting an alert query optimization in the rule may improve the memory usage, which I’m not sure How to do. The options in the query builder works differently for different data sources.

:man_shrugging: try to use profiling and analyze mem/heap usage. But that’s very low level access to Golang app.
You need to know if it is worth it - a few hours of your profiling vs price of 4 GB memory.

Thank you guys for the support. the issue got resolved when I changed the $__timeFilter(time) where condition to the actual time condition like time > NOW() - interval '3 days'. Sharing if it helps someone.

1 Like