LOKI Read and Write Pods are consuming a lot of memory

The Loki Read and Write pods consume too much memory, so the node fails due to overconsumption of memory.
can someone help with the issue and give insights on why this could happen and its solution?

How heavy is your log traffic? Have you looked at the metrics and see what the memory footprint looks like?

I don’t think Loki re-balances itself (i could be wrong, haven’t looked closely enough), if you find uneven memory usage like your screenshot above it might be a good idea to create one more writer and gracefully restart that one with heavy memory.

Your reader looks fine though.

Hi @tonyswumac , Thanks for answering my question.
Below is the observations I have made:

  • Loki write pods start with a minimum of ~700Mi then it spikes continuously to its limits(we kept the limit to 2048Mi) and then it restarts.

  • Loki read pods are very stable and they don’t consume too much memory.

  • After making the replicas of write pod to =4, I observed the same pattern where the pods would start up with a smaller memory size and spike up to their limits eventually then the pods restart.

Is there a way to load balance the traffic onto the write pods so that the memory consumption is even among the number of write pods?

@tonyswumac , Can you please suggest something on the above scenario faced by us?

Thanks in advance!!

level=warn ts=2023-05-10T05:55:45.663299596Z caller=client.go:379 component=client host=loki-gateway msg=“error sending batch, will retry” status=429 error=“server returned HTTP status 429 Too Many Requests (429): Ingestion rate limit exceeded for user fake (limit: 1398101 bytes/sec) while attempting to ingest ‘1500’ lines totaling ‘1047058’ bytes, reduce log volume or contact your Loki administrator to see if the limit can be increased”

The above log is from promtail pods.

Do you think that the write pods are failing because of the massive amount of logs ingested by promtail?

No, I don’t think 1500 lines per second (or 1047058 byte per second, which is only roughly 1MB) is heavy. I would recommend you to check a couple of things:

  1. Loki components expose certain metrics (default through /metrics). I’d look into that, and see if you can determine why the writers are unbalanced. Things such as number of streams, number of chunks, age of chunks, may be of interest.

  2. Your loki writer containers should be behind some sort of load balancer, if they currently aren’t.

  3. Check your configuration and see how long you are keeping chunks in memory.

  4. Your loki configuration might help.