Querier memory leak

m3r1 · June 28, 2021, 2:33pm

During query execution the queriers RAM go up as expected, however after the results were returned to grafana the RAM doesn’t go down. When running the exact query again the RAM maintains its value.

I’m running Loki on a vm deployment.

I’m assuming (but not sure) that queriers keep chunks for queries in memory and never get rid of it.

Loki was meant to be deployed in kubernetees where a memory limit is much easier to enforce on a container. When a container shuts down due to an OOM another one quickly takes its place. So never getting rid of chunks in memory makes sense I guess, however I don’t have the ability to deploy Loki in kubernetees and looking for a solution fitting to a vm deployment.

Any help would be greatly appreciated.

timansky · July 2, 2021, 9:55am

You can use systemd parameter in unit:

for restart => Restart=always or Restart=on-failure
for limit => MemoryMax=

m3r1 · July 2, 2021, 11:40am

Thanks for taking the time to answer.

This will probably be our go to temporary solution.

We did however misunderstand some querier and frontend configurations that their change made Loki perform way better. I’ll update here if the problem reoccurred with the new configuration when we get some data for challenging queries.

jyengk · July 16, 2021, 12:40pm

We are facing the same issue on our k8 setup. Any specific changes to the querier and frontend configurations? And by how much percent was Loki performing better?

Thanks

m3r1 · July 18, 2021, 5:52am

The problem was a configuration one, it’s actually kinda silly.

frontend_worker:
  parallelism: This should at most the number of cpu cores in your querier.

This fixed the problem I mentioned in this blog post, we were starting to get consistnet results. The RAM went down after executing queries and we were getting faster results. We misunderstood what this configuration meant which was quite foolish from our end.

However our querying times (metric queries) are still extremely slow. Without caching, presenting dashboards of Loki metric queries in Grafana is nearly unusable. We are still investigating this issue.

Moreover in the recommended production configuration it is recommended to set:

parallelism: CPU's in queriers /  frontend instances

We have 2 frontends. I’m not sure how exactly the worker and frontend relation work but when we used more than half of the querier CPU’s in the parallelism (half because we have 2 frontends) we weren’t getting faster results despite CPU usage going up which is weird. We decided for now to stick with half the querier CPU’s for this configuration.

jyengk · July 21, 2021, 12:03pm

Thank you.
I guess you are using the distributed loki. We were using the loki monolith, hence the parallellism configs did not help much. So we are trying out on the distributed loki now.

m3r1 · July 21, 2021, 12:22pm

Yes I am running distributed Loki.
If you are running an all in one deployment this memory behavior might be because of the ingester.
I would suggest taking a look at the following configurations:
chunk_retain_period
chunk_target_size

jyengk · August 18, 2021, 5:14pm

Thanks. We tried distributed loki too, but of no help since the querier is still leaking memory in 2.2.0 and 2.2.1 releases. So raised a BUG and and they claimed to have fixed this in Loki 2.3.0.

system · August 18, 2022, 5:14pm

This topic was automatically closed 365 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Loki Pods Not Releasing Memory After Query Completion Grafana Loki	7	188	September 13, 2024
Loki ingesters ram overflowing Grafana Loki	4	574	June 22, 2022
Loki Queriers Give Error Fetching Chunks, Context Canceled on Grafana Queries Grafana Loki	1	1411	October 5, 2022
Memory usage implose when doing regular requests Grafana Loki	1	1946	June 1, 2022
Loki ingester killed OOM Grafana Loki loki	1	506	May 29, 2024

Querier memory leak

Related topics