Loki wal filling filesystem

bilschnice · June 27, 2024, 7:17pm

I noticed an interesting issue with our loki ingesters. The wal directory is filling up the filesystem

The relevant config:

  wal:
    enabled: true
    dir: /var/loki/wal
    checkpoint_duration: 5m0s
    flush_on_shutdown: true
    replay_memory_ceiling: 4GB

And yet

/dev/nvme2n1             19.5G     19.5G         0 100% /var/loki

I did not find anything obvious. Assuming the replay_memory_ceiling does what I think it should - limit the wal to 4gb - then we should not be filling up a 20gb filesystem like this.

Is there a config change I need to make to cap the wal at a certain size? Do I just need to figure out a larger size for the wal per instance somehow? ( ingestion rate? )

What could cause the wal to fill up like this?

tonyswumac · June 28, 2024, 2:00am

I think you may have misunderstood the configuration replay_memory_ceiling.

According to documentation:

  # Maximum memory size the WAL may use during replay. After hitting this, it
  # will flush data to storage before continuing. A unit suffix (KB, MB, GB) may
  # be applied.
  # CLI flag: -ingester.wal-replay-memory-ceiling
  [replay_memory_ceiling: <int> | default = 4GB]

Replay is what happens when ingesters unexpectedly exit and upon restart try to replay what’s in the WAL directory. The configuration has nothing to do with how big your WAL directory is.

If you are running out of disk space on WAL volume, I think the only option is to make it bigger.

bilschnice · June 28, 2024, 3:12pm

Does anyone know of a way to limit the size of the wal logs stored per pod? Just throwing more storage at it feels wrong to me there has to be some way to limit it. Or at least a way to calculate the storage needed somehow.

The load test I’m running has 300 random application names with the same log content. We are running jmeter to post it over and over and at different numbers of VU.

I can see the wal log directory grow and shrink in size. Its just not clear to me how to go about sizing it

bilschnice · July 1, 2024, 6:04pm

So based on the load generation I ended up with a 60gb data volume for the ingester pod. I’m sure we can still fill it up but the ingest rate of the pod itself is the limiting factor not the wal space

Topic		Replies	Views
Ingesters overwrite logs after /wal storage is full Grafana Loki loki	4	213	June 10, 2025
WAL setup in Loki-Distributed Helm Grafana Loki	1	855	January 29, 2021
Ingester high memory usage Grafana Loki loki , configuration	1	2404	December 12, 2022
Loki wal use big part of diskspace Grafana Loki loki	0	567	July 14, 2022
Loki ingester killed OOM Grafana Loki loki	1	888	May 29, 2024

Loki wal filling filesystem

Related topics