'du' and 'df' showing different usage

Hi,

I have been reading the docs but I’m still missing some critical info about the way that Loki stores/uses data, as I have been having problems with my main storage that started being full randomly. I have it deployed via Helm chart.

This is my (fairly summarized) values.yaml file I’m using, in case it’s helpful:

  compactor:
    working_directory: /var/loki/data/retention
    compaction_interval: 10m
    retention_enabled: true
    retention_delete_delay: 2h
    retention_delete_worker_count: 150
    delete_request_store: s3
  ingester:
    chunk_block_size: 262144
    chunk_retain_period: 1m
    chunk_target_size: 1572864 # 1.5MB
    chunk_encoding: snappy
    max_chunk_age: 2h
    chunk_idle_period: 1h
    lifecycler:
      final_sleep: 0s
      ring:
        replication_factor: 1
        heartbeat_timeout: 10m
  persistence:
    enabled: true
    size: 40Gi
  limits_config:
    retention_period: 30d
    ingestion_rate_mb: 8
    ingestion_burst_size_mb: 16
    per_stream_rate_limit: 5MB
    per_stream_rate_limit_burst: 15MB
    tsdb_max_query_parallelism: 128
    split_queries_by_interval: 1h
  commonConfig:
    path_prefix: /var/loki
    replication_factor: 3
    compactor_address: '{{ include "loki.compactorAddress" . }}'
  storage:
    bucketNames:
      chunks: loki-boltdb
      ruler: loki-boltdb
    type: s3
    s3:
      endpoint: ...
    filesystem:
      chunks_directory: /var/loki/chunks
      rules_directory: /var/loki/rules

I’d like to understand why sometimes the df usage spikes quite a bit. This is the output right now:

/ $ du -hs /var/loki
1.1G	/var/loki
/ $ df -h /var/loki
Filesystem                Size      Used Available Use% Mounted on
/dev/zzz               29.2G      1.5G     27.7G   4% /var/loki

But I’ve seen it go to 100% and indeed there was little usage from the du command. This is due to opened files from Loki that are already removed. Under these circumstances, we commonly reboot and the problem goes away.

Is this a bug from Loki, or is any config perhaps conflicting from my side?

1 Like

I don’t use filesystem with Loki, so I am not really sure what’s happening. Couple of questions:

  1. Are you running du / df from inside your container? Have you tried using lsof to see what’s hanging onto the file descriptors?
  2. Not sure why replication_factor is set to 3 when you have only one Loki instance.
  3. Might consider disabling WAL if you are storing Loki data in filesystem.

Hi @tonyswumac , thanks for helping me out

  1. Yes, I’m running du / df from inside the container.
    I used lsof +L1 and currently it’s mostly TSDB files under /var/loki/tsdb-shipper-cache/ (although there is one descriptor related to WAL /var/loki/wal/checkpoint.053812.tmp/00000000 which is right now occupying 70MB), but I plan to check when the next big spike appears again

  2. I’m using 2 instances of Loki. What do you recommend?

  3. Is it? I don’t really know what’s the ingester actually doing, it’s not clear to me as per the docs. I thought it was sending everything to S3, and that WAL is important because otherwise I lose data which is still in memory if I reboot the pod.

How do you leave out the filesystem with Loki? What do you refer to apart from disabling WAL? I wouldn’t like to lose any log (At least, let’s say, more than 1min worth of logs)

EDIT: I will share the full values.yaml file here so you could help me out better then:

deploymentMode: SingleBinary
loki:
  image:
    tag: 3.1.0
  config: |
    {{- if .Values.enterprise.enabled}}
    {{- tpl .Values.enterprise.config . }}
    {{- else }}
    auth_enabled: {{ .Values.loki.auth_enabled }}
    {{- end }}

    {{- with .Values.loki.server }}
    server:
      {{- toYaml . | nindent 2}}
    {{- end}}

    pattern_ingester:
      enabled: {{ .Values.loki.pattern_ingester.enabled }}

    memberlist:
    {{- if .Values.loki.memberlistConfig }}
      {{- toYaml .Values.loki.memberlistConfig | nindent 2 }}
    {{- else }}
    {{- if .Values.loki.extraMemberlistConfig}}
    {{- toYaml .Values.loki.extraMemberlistConfig | nindent 2}}
    {{- end }}
      join_members:
        - {{ include "loki.memberlist" . }}
        {{- with .Values.migrate.fromDistributed }}
        {{- if .enabled }}
        - {{ .memberlistService }}
        {{- end }}
        {{- end }}
    {{- end }}

    {{- with .Values.loki.ingester }}
    ingester:
      {{- tpl (. | toYaml) $ | nindent 4 }}
    {{- end }}

    {{- if .Values.loki.commonConfig}}
    common:
    {{- toYaml .Values.loki.commonConfig | nindent 2}}
      storage:
      {{- include "loki.commonStorageConfig" . | nindent 4}}
    {{- end}}

    {{- with .Values.loki.limits_config }}
    limits_config:
      {{- tpl (. | toYaml) $ | nindent 4 }}
    {{- end }}

    runtime_config:
      file: /etc/loki/runtime-config/runtime-config.yaml

    {{- with .Values.chunksCache }}
    {{- if .enabled }}
    chunk_store_config:
      chunk_cache_config:
        default_validity: {{ .defaultValidity }}
        background:
          writeback_goroutines: {{ .writebackParallelism }}
          writeback_buffer: {{ .writebackBuffer }}
          writeback_size_limit: {{ .writebackSizeLimit }}
        memcached:
          batch_size: {{ .batchSize }}
          parallelism: {{ .parallelism }}
        memcached_client:
          addresses: dnssrvnoa+_memcached-client._tcp.{{ template "loki.fullname" $ }}-chunks-cache.{{ $.Release.Namespace }}.svc
          consistent_hash: true
          timeout: {{ .timeout }}
          max_idle_conns: 72
    {{- end }}
    {{- end }}

    {{- if .Values.loki.schemaConfig }}
    schema_config:
    {{- toYaml .Values.loki.schemaConfig | nindent 2}}
    {{- end }}

    {{- if .Values.loki.useTestSchema }}
    schema_config:
    {{- toYaml .Values.loki.testSchemaConfig | nindent 2}}
    {{- end }}

    {{ include "loki.rulerConfig" . }}

    {{- if or .Values.tableManager.retention_deletes_enabled .Values.tableManager.retention_period }}
    table_manager:
      retention_deletes_enabled: {{ .Values.tableManager.retention_deletes_enabled }}
      retention_period: {{ .Values.tableManager.retention_period }}
    {{- end }}

    query_range:
      align_queries_with_step: true
      {{- with .Values.loki.query_range }}
      {{- tpl (. | toYaml) $ | nindent 2 }}
      {{- end }}
      {{- if .Values.resultsCache.enabled }}
      {{- with .Values.resultsCache }}
      cache_results: true
      results_cache:
        cache:
          default_validity: {{ .defaultValidity }}
          background:
            writeback_goroutines: {{ .writebackParallelism }}
            writeback_buffer: {{ .writebackBuffer }}
            writeback_size_limit: {{ .writebackSizeLimit }}
          memcached_client:
            consistent_hash: true
            addresses: dnssrvnoa+_memcached-client._tcp.{{ template "loki.fullname" $ }}-results-cache.{{ $.Release.Namespace }}.svc
            timeout: {{ .timeout }}
            update_interval: 1m
      {{- end }}
      {{- end }}

    {{- with .Values.loki.storage_config }}
    storage_config:
      {{- tpl (. | toYaml) $ | nindent 4 }}
    {{- end }}

    {{- with .Values.loki.query_scheduler }}
    query_scheduler:
      {{- tpl (. | toYaml) $ | nindent 4 }}
    {{- end }}

    {{- with .Values.loki.compactor }}
    compactor:
      {{- tpl (. | toYaml) $ | nindent 4 }}
    {{- end }}

    {{- with .Values.loki.analytics }}
    analytics:
      {{- tpl (. | toYaml) $ | nindent 4 }}
    {{- end }}

    {{- with .Values.loki.querier }}
    querier:
      {{- tpl (. | toYaml) $ | nindent 4 }}
    {{- end }}

    {{- with .Values.loki.index_gateway }}
    index_gateway:
      {{- tpl (. | toYaml) $ | nindent 4 }}
    {{- end }}

    {{- with .Values.loki.frontend }}
    frontend:
      {{- tpl (. | toYaml) $ | nindent 4 }}
    {{- end }}

    {{- with .Values.loki.frontend_worker }}
    frontend_worker:
      {{- tpl (. | toYaml) $ | nindent 4 }}
    {{- end }}

    {{- with .Values.loki.distributor }}
    distributor:
      {{- tpl (. | toYaml) $ | nindent 4 }}
    {{- end }}

    tracing:
      enabled: {{ .Values.loki.tracing.enabled }}
  auth_enabled: false
  server:
    http_listen_port: 3100
    grpc_listen_port: 9095
    grpc_server_max_recv_msg_size: 52434304 # 50MBi
    grpc_server_max_send_msg_size: 52434304 # 50MBi
  limits_config:
    retention_period: 30d
    ingestion_rate_mb: 8
    ingestion_burst_size_mb: 16
    per_stream_rate_limit: 5MB
    per_stream_rate_limit_burst: 15MB
    tsdb_max_query_parallelism: 128
    split_queries_by_interval: 1h
  commonConfig:
    path_prefix: /var/loki
    replication_factor: 3
    compactor_address: '{{ include "loki.compactorAddress" . }}'
  storage:
    bucketNames:
      chunks: loki-boltdb
      ruler: loki-boltdb
    type: s3
    s3:
      endpoint: ...
      region: ...
      secretAccessKey: ${S3_SECRET_ACCESS_KEY}
      accessKeyId: ${S3_ACCESS_KEY_ID}
      s3ForcePathStyle: true
      sse_encryption: false
      insecure: false
      http_config:
        idle_conn_timeout: 90s
        response_header_timeout: 0s
        insecure_skip_verify: true
    filesystem:
      chunks_directory: /var/loki/chunks
      rules_directory: /var/loki/rules
  schemaConfig:
    configs:
      - from: "2024-04-16"
        store: tsdb
        object_store: s3
        schema: v13
        index:
          prefix: index_
          period: 24h
      - from: "2024-09-24"
        store: tsdb
        object_store: s3
        schema: v13
        index:
          prefix: index_
          period: 24h
  query_scheduler:
    max_outstanding_requests_per_tenant: 2048
  storage_config:
    hedging:
      at: "250ms"
      max_per_second: 20
      up_to: 3
  compactor:
    working_directory: /var/loki/data/retention
    compaction_interval: 10m
    retention_enabled: true
    retention_delete_delay: 2h
    retention_delete_worker_count: 150
    delete_request_store: s3
  pattern_ingester:
    enabled: false
  querier:
    max_concurrent: 8
  ingester:
    chunk_block_size: 262144
    chunk_retain_period: 1m
    chunk_target_size: 1572864 # 1.5MB
    chunk_encoding: snappy
    max_chunk_age: 2h
    chunk_idle_period: 1h
    lifecycler:
      final_sleep: 0s
      ring:
        replication_factor: 1
        heartbeat_timeout: 10m

  frontend:
    scheduler_address: '{{ include "loki.querySchedulerAddress" . }}'
    log_queries_longer_than: 20s
  frontend_worker:
    scheduler_address: '{{ include "loki.querySchedulerAddress" . }}'

ingress:
  enabled: false

test:
  enabled: false
lokiCanary:
  enabled: false

singleBinary:
  replicas: 2
  extraArgs:
    - -config.expand-env=true
  extraEnv:
    - name: S3_ACCESS_KEY_ID
      valueFrom:
        secretKeyRef:
          name: loki-boltdb-bucket
          key: AWS_ACCESS_KEY_ID
    - name: S3_SECRET_ACCESS_KEY
      valueFrom:
        secretKeyRef:
          name: loki-boltdb-bucket
          key: AWS_SECRET_ACCESS_KEY
  resources:
    limits:
      cpu: "5"
      memory: 8Gi
    requests:
      cpu: 200m
      memory: 3Gi

  persistence:
    enabled: true
    size: 40Gi
    enableStatefulSetAutoDeletePVC: false

write:
  replicas: 0
read:
  replicas: 0
backend:
  replicas: 0

resultsCache:
  enabled: true
  replicas: 2

chunksCache:
  enabled: true
  replicas: 2

gateway:
  enabled: false

loki:
  limits_config:
    retention_stream:
      - selector: '{app="..."}'
        priority: 1
        period: 7d

(I have the values.yaml scattered over multiple files, I pasted it altogether, apologies in case that creates some confusion)
Thank you

I don’t see anything obviously wrong with the configuration. I’d say if you can confirm that your chunks are written to S3, and that you aren’t running out of space in your persistent volume for WAL, you should be good.

I wouldn’t be too concerned about differences between du and dh. They calculate blocks differently, I don’t remember the details, but I am sure Google can tell you.

1 Like

I updated to 100GB, so let’s see if giving it more room fixes this excepcionally transient issue. Thanks for reviewing my config.

FYI: The difference between du and df (I wrote dh in the title but it’s actually df) becomes noticeable whenever there is a deleted file that is still opened by a process, so almost at all times there are files under the WAL and TSDB directories that are already deleted, but for Loki the file descriptor is still open, and du shows the size of it. I suspect this is causing the filling-ups of the disks.

I honestly haven’t run into this. Our write Loki targets on average stays alive for 100 days (refreshed whenever we refresh the cluster nodes), and I’ve never seen this. If you install and run lsof in your Loki container you might be able to find out what’s holding it up.

Some additional info.

~ ❯ k exec -it loki-stack-1 -c loki -- df -h
Filesystem                Size      Used Available Use% Mounted on
overlay                 874.7G    355.4G    474.8G  43% /
tmpfs                    64.0M         0     64.0M   0% /dev
/dev/mapper/vg0-root    874.7G    355.4G    474.8G  43% /tmp
/dev/mapper/vg0-root    874.7G    355.4G    474.8G  43% /rules
/dev/rbd19               98.1G     33.9G     64.2G  35% /var/loki
/dev/mapper/vg0-root    874.7G    355.4G    474.8G  43% /etc/hosts
/dev/mapper/vg0-root    874.7G    355.4G    474.8G  43% /dev/termination-log
/dev/mapper/vg0-root    874.7G    355.4G    474.8G  43% /etc/hostname
/dev/mapper/vg0-root    874.7G    355.4G    474.8G  43% /etc/resolv.conf
shm                      64.0M         0     64.0M   0% /dev/shm
/dev/mapper/vg0-root    874.7G    355.4G    474.8G  43% /etc/loki/config
/dev/mapper/vg0-root    874.7G    355.4G    474.8G  43% /etc/loki/runtime-config
tmpfs                   503.2G     12.0K    503.2G   0% /run/secrets/kubernetes.io/serviceaccount
tmpfs                   251.6G         0    251.6G   0% /proc/acpi
tmpfs                    64.0M         0     64.0M   0% /proc/kcore
tmpfs                    64.0M         0     64.0M   0% /proc/keys
tmpfs                    64.0M         0     64.0M   0% /proc/timer_list
tmpfs                   251.6G         0    251.6G   0% /proc/scsi
tmpfs                   251.6G         0    251.6G   0% /sys/firmware

and:

~ ❯ k exec -it loki-stack-1 -c loki -- du -hs /var/loki
735.5M	/var/loki

And as I shared in past comments, this is mostly coming from tsdb-shipper-cache or the WAL:

~ ❯ k exec -it loki-stack-1 -c loki -- lsof +L1
1	/usr/bin/loki	0	/dev/null
1	/usr/bin/loki	1	pipe:[1930131713]
1	/usr/bin/loki	2	pipe:[1930131714]
1	/usr/bin/loki	3	socket:[1929276169]
1	/usr/bin/loki	4	anon_inode:[eventpoll]
1	/usr/bin/loki	5	pipe:[1929426731]
1	/usr/bin/loki	6	pipe:[1929426731]
1	/usr/bin/loki	7	socket:[1929276173]
1	/usr/bin/loki	8	socket:[3945572279]
1	/usr/bin/loki	9	socket:[1930121098]
1	/usr/bin/loki	10	/var/loki/data/retention/deletion/delete_requests/delete_requests
1	/usr/bin/loki	11	socket:[3947680960]
1	/usr/bin/loki	12	socket:[3927301334]
1	/usr/bin/loki	13	socket:[3819566135]
1	/usr/bin/loki	14	socket:[218707]
1	/usr/bin/loki	15	socket:[3947682238]
1	/usr/bin/loki	16	socket:[3944805681]
1	/usr/bin/loki	17	socket:[3842777244]
1	/usr/bin/loki	19	socket:[3823830383]
1	/usr/bin/loki	20	socket:[360997018]
1	/usr/bin/loki	21	socket:[360997045]
1	/usr/bin/loki	22	/var/loki/tsdb-shipper-active/multitenant/index_20039/1731403822-loki-stack-1-1729494248949849743.tsdb
1	/usr/bin/loki	23	socket:[3945093985]
1	/usr/bin/loki	25	socket:[736276]
1	/usr/bin/loki	27	socket:[125559117]
1	/usr/bin/loki	28	/var/loki/tsdb-shipper-active/wal/s3_2024-04-16/1731404722/00000000
1	/usr/bin/loki	29	socket:[3946923576]
1	/usr/bin/loki	30	socket:[3818084939]
1	/usr/bin/loki	31	socket:[3930553114]
1	/usr/bin/loki	32	socket:[2930171631]
1	/usr/bin/loki	34	socket:[1928701746]
1	/usr/bin/loki	35	socket:[3946815975]
1	/usr/bin/loki	36	socket:[2077132112]
1	/usr/bin/loki	39	socket:[1931556898]
1	/usr/bin/loki	40	socket:[1928677289]
1	/usr/bin/loki	41	socket:[360997019]
1	/usr/bin/loki	42	/var/loki/tsdb-shipper-active/wal/s3_2024-09-24/1731404722/00000000
1	/usr/bin/loki	43	socket:[3819522947]
1	/usr/bin/loki	44	socket:[1930240870]
1	/usr/bin/loki	45	socket:[1929405119]
1	/usr/bin/loki	46	socket:[1930109396]
1	/usr/bin/loki	47	socket:[1930862925]
1	/usr/bin/loki	48	socket:[1929597839]
1	/usr/bin/loki	49	socket:[1931219998]
1	/usr/bin/loki	50	socket:[3927626286]
1	/usr/bin/loki	51	socket:[3944109364]
1	/usr/bin/loki	52	socket:[1929529069]
1	/usr/bin/loki	53	socket:[1931220009]
1	/usr/bin/loki	54	socket:[1928678204]
1	/usr/bin/loki	55	socket:[1931556943]
1	/usr/bin/loki	56	socket:[1930498351]
1	/usr/bin/loki	57	socket:[1930884667]
1	/usr/bin/loki	58	socket:[1929401296]
1	/usr/bin/loki	59	socket:[1930139269]
1	/usr/bin/loki	60	socket:[1930835212]
1	/usr/bin/loki	61	socket:[1930496833]
1	/usr/bin/loki	62	socket:[1931220303]
1	/usr/bin/loki	63	socket:[3821067910]
1	/usr/bin/loki	64	socket:[1991872741]
1	/usr/bin/loki	69	socket:[1991136982]
1	/usr/bin/loki	70	socket:[3941884916]
1	/usr/bin/loki	71	socket:[3928354402]
1	/usr/bin/loki	72	/var/loki/tsdb-shipper-cache/index_20039/fake/1731404139939417407-compactor-1731356607217-1731403812832-90a5bb0a.tsdb
1	/usr/bin/loki	74	socket:[359547452]
1	/usr/bin/loki	82	/var/loki/tsdb-shipper-cache/index_20039/1731403817-loki-stack-0-1728284179805865181.tsdb
1	/usr/bin/loki	83	/var/loki/tsdb-shipper-cache/index_20039/1731403822-loki-stack-1-1729494248949849743.tsdb
1	/usr/bin/loki	84	/var/loki/wal/00006414
1	/usr/bin/loki	85	/var/loki/wal/checkpoint.006413.tmp/00000000
1	/usr/bin/loki	452	socket:[3822514858]

I really think this should be something the developers should look into. Already deleted files which are still opened in Loki is a waste of storage resources. And this is always growing up, so it end ups corrupting the storage, or in the best case simply stopping loki. Why do I have to have a 100GB volume (per loki replica) when it’s only really needed 2GB or so? And why do I need it to restart the STS manually from time to time so that it doesn’t cross this 100GB limit?

In your lsof command I see 6 lines for shipper-cache, which one of those are files already deleted?

I just double checked again, on our writer container that’s been up for 80-ish days I don’t see any open file descriptors from Loki pointing to a file that no longer exists.

1 Like

Let me show you more data as of now (last night there was a restart, so currently I can only show the inconsistency of the last 5h (which is a small difference, but it still is):

~ ❯ k exec -it loki-stack-0 -c loki -- df -h   
/dev/rbd16               98.2G    957.9M     97.3G   1% /var/loki
...
~ ❯ k exec -it loki-stack-0 -c loki -- du -hs /var/loki
248.6M	/var/loki
~ ❯ k exec -it loki-stack-1 -c loki -- lsof +L1
1	/usr/bin/loki	10	/var/loki/data/retention/deletion/delete_requests/delete_requests
1	/usr/bin/loki	46	/var/loki/tsdb-shipper-active/wal/s3_2024-09-24/1731492032/00000000
1	/usr/bin/loki	47	/var/loki/tsdb-shipper-active/multitenant/index_20040/1731491132-loki-stack-1-1713781691510093445.tsdb
1	/usr/bin/loki	49	/var/loki/wal/00061563
1	/usr/bin/loki	53	/var/loki/tsdb-shipper-cache/index_20040/fake/1731492263966231020-compactor-1731447411558-1731492014228-c6513adb.tsdb
1	/usr/bin/loki	57	/var/loki/wal/checkpoint.061562.tmp/00000000
1	/usr/bin/loki	64	/var/loki/tsdb-shipper-active/wal/s3_2024-04-16/1731492032/00000000
1	/usr/bin/loki	93	/var/loki/tsdb-shipper-cache/index_20039/fake/1731474261618529089-compactor-1731339288343-1731465277302-8833e692.tsdb
...

For each one:

~ ❯ k exec -it loki-stack-1 -c loki -- ls -lash /var/loki/data/retention/deletion/delete_requests/delete_requests
  16.0K -rw-rw-r--    1 loki     loki       16.0K May  7  2024 /var/loki/data/retention/deletion/delete_requests/delete_requests
~ ❯ k exec -it loki-stack-1 -c loki -- ls -lash /var/loki/tsdb-shipper-active/wal/s3_2024-09-24/1731492032/00000000
  80.0K -rw-r--r--    1 loki     loki       75.6K Nov 13 10:15 /var/loki/tsdb-shipper-active/wal/s3_2024-09-24/1731492032/00000000
~ ❯ k exec -it loki-stack-1 -c loki -- ls -lash /var/loki/tsdb-shipper-active/multitenant/index_20040/1731491132-loki-stack-1-1713781691510093445.tsdb
 128.0K -rw-r--r--    1 loki     loki      127.5K Nov 13 10:00 /var/loki/tsdb-shipper-active/multitenant/index_20040/1731491132-loki-stack-1-1713781691510093445.tsdb
~ ❯ k exec -it loki-stack-1 -c loki -- ls -lash /var/loki/wal/00061563
 159.6M -rw-r--r--    1 loki     loki      159.6M Nov 13 10:14 /var/loki/wal/00061563
~ ❯ k exec -it loki-stack-1 -c loki -- ls -lash /var/loki/tsdb-shipper-cache/index_20040/fake/1731492263966231020-compactor-1731447411558-1731492014228-c6513adb.tsdb
   7.1M -rw-r--r--    1 loki     loki        7.1M Nov 13 10:04 /var/loki/tsdb-shipper-cache/index_20040/fake/1731492263966231020-compactor-1731447411558-1731492014228-c6513adb.tsdb
~ ❯ k exec -it loki-stack-1 -c loki -- ls -lash /var/loki/wal/checkpoint.061562.tmp/00000000
ls: /var/loki/wal/checkpoint.061562.tmp/00000000: No such file or directory
command terminated with exit code 1
~ ❯ k exec -it loki-stack-1 -c loki -- ls -lash /var/loki/tsdb-shipper-active/wal/s3_2024-04-16/1731492032/00000000
ls: /var/loki/tsdb-shipper-active/wal/s3_2024-04-16/1731492032/00000000: No such file or directory
command terminated with exit code 1
~ ❯ k exec -it loki-stack-1 -c loki -- ls -lash /var/loki/tsdb-shipper-cache/index_20039/fake/1731474261618529089-compactor-1731339288343-1731465277302-8833e692.tsdb
  16.4M -rw-r--r--    1 loki     loki       16.4M Nov 13 08:04 /var/loki/tsdb-shipper-cache/index_20039/fake/1731474261618529089-compactor-1731339288343-1731465277302-8833e692.tsdb

I am not sure what’s happening with your cluster, I can only say that WAL going full has never been a problem for us (4 writer, 50GB each for WAL). Perhaps try opening a bug request in Github.

I’ll say it may be useful to include the same data as your latest reply (lsof + individual ls to verify if file exists) when your WAL disk is almost full. It would be interesting to see how many file handlers are being left open.

1 Like

Hi all,

I’m facing the same issue. du is showing ~137M but df ~887M

I’m storing data on azure, but the last 24 hours on local for fast response.

Maybe it’s something wrong in my values?

loki:
  auth_enabled: false
  storage:
    bucketNames:
      chunks: loki
      ruler: loki
      admin: loki
    type: azure
    azure:
      accountName: xxxxxxxxxxxx
      useManagedIdentity: true
      useFederatedToken: false
      userAssignedId: "xxxxxxxxxxxx"
  compactor:
    working_directory: /var/loki/data/boltdb-shipper-compactor
  storage_config:
    boltdb_shipper:
      active_index_directory: /var/loki/data/boltdb-shipper-active
      cache_location: /var/loki/data/boltdb-shipper-cache
      cache_ttl: 24h
    tsdb_shipper:
      active_index_directory: /var/loki/data/tsdb-index
      cache_location: /var/loki/data/tsdb-cache
    filesystem:
      directory: /var/loki/data/chunks
  schemaConfig:
    configs:
    - from: "2020-12-11"
      index:
        period: 24h
        prefix: index_
      object_store: azure
      schema: v11
      store: boltdb-shipper
    - from: "2025-02-04"
      index:
        prefix: index_
        period: 24h
      object_store: azure
      schema: v13
      store: tsdb
  ingester:
    chunk_block_size: 3145728 # 3mb - The targeted _uncompressed_ size in bytes of a chunk block
    max_chunk_age: 24h
    chunk_idle_period: 60m
    chunk_retain_period: 30m
    chunk_target_size: 524288 # 0.5mb - The targeted _uncompressed_ size in bytes of a chunk block
  commonConfig:
    replication_factor: 1
  limits_config:
    max_concurrent_tail_requests: 100
    split_queries_by_interval: 24h
    max_query_parallelism: 100
    reject_old_samples: true
    reject_old_samples_max_age: 168h
    ingestion_rate_mb: 20
    ingestion_burst_size_mb: 50
    max_cache_freshness_per_query: 10m
    max_streams_per_user: 0
  query_scheduler:
    max_outstanding_requests_per_tenant: 4096
  frontend:
    max_outstanding_per_tenant: 4096
    compress_responses: true
  memcached:
    chunk_cache:
      enabled: true
      host: chunk-cache-memcached.loki.svc
      service: memcached-client
      batch_size: 256
      parallelism: 10
    results_cache:
      enabled: true
      host: results-cache-memcached.loki.svc
      service: memcached-client
      default_validity: 12h
  serviceAccount:
  name: xxxx
  create: false
backend:
  replicas: 1
  persistence:
    storageClass: "managed-csi-premium"
    storageClassName: managed-csi-premium
write:
  replicas: 1
  persistence:
    storageClass: "managed-csi-premium"
    storageClassName: managed-csi-premium
read:
  replicas: 1
memberlist:
  service:
    publishNotReadyAddresses: true
monitoring:
  dashboards:
    enabled: false
  selfMonitoring:
    enabled: false
lokiCanary:
  enabled: false
test:
  enabled: false
compactor:
  replicas: 0