Loki k8s single binary log retention configuration - not deleting logs

I have installed the Grafana Loki single binary in my Kubernetes cluster using the Helm chart. Everything works great except that my persistent storage (filesystem) is filling up. I have read the storage retention configuration docs from Grafana and many posts here and elsewhere about this. I believe that I have configured my Loki installation to remove logs using the compactor, but my persistent volume keeps filling up.

I am using version 3.1.0 of the Loki helm chart (loki-3.1.0.tgz) to install version 2.6.1 of the Loki image (grafana/loki:2.6.1)

Here is my values.yaml file that I am using to install Loki:

# fullnameOverride: loki

# global:
#   image:
#     registry: null

monitoring:
  dashboards:
    enabled: false
  rules:
    enabled: false
  alerts:
    enabled: false
  serviceMonitor:
    enabled: false    
  selfMonitoring:
    enabled: false
    grafanaAgent:
      installOperator: false    
    lokiCanary:
      enabled: false 

loki:
  image:
    # -- The Docker registry
    registry: harbor.fractilia.com/library
    # -- Docker image repository
    repository: grafana/loki
    # -- Overrides the image tag whose default is the chart's appVersion
    tag: 2.6.1
    # -- Docker image pull policy
    pullPolicy: IfNotPresent
  # Should authentication be enabled
  auth_enabled: false
  storage:
    type: filesystem


  compactor:
    shared_store: filesystem
    working_directory: /var/loki/boltdb-shipper-compactor
    compaction_interval: 10m
    retention_enabled: true
    retention_delete_delay: 1h
    retention_delete_worker_count: 100

  limits_config:
    retention_period: 2d

  storage_config:
    boltdb_shipper:
      active_index_directory: /var/loki/boltdb-shipper-active
      cache_location: /var/loki/boltdb-shipper-cache
      cache_ttl: 24h
      shared_store: filesystem
    filesystem:
      directory: /var/loki/chunks

  # commonConfig:
  #   path_prefix: /var/loki
  #   replication_factor: 1

  # server:
  #   log_level: debug

  # NOTE: We need the chunk_store_config and ingester setting, and I don't see another way of getting them into the config.
  config: |
    {{- if .Values.enterprise.enabled}}
    {{- tpl .Values.enterprise.config . }}
    {{- else }}
    auth_enabled: {{ .Values.loki.auth_enabled }}
    {{- end }}

    {{- with .Values.loki.server }}
    server:
      {{- toYaml . | nindent 2}}
    {{- end}}

    memberlist:
      join_members:
        - {{ include "loki.memberlist" . }}

    {{- if .Values.loki.commonConfig}}
    common:
    {{- toYaml .Values.loki.commonConfig | nindent 2}}
      storage:
      {{- include "loki.commonStorageConfig" . | nindent 4}}
    {{- end}}

    {{- with .Values.loki.limits_config }}
    limits_config:
      {{- tpl (. | toYaml) $ | nindent 4 }}
    {{- end }}

    {{- with .Values.loki.memcached.chunk_cache }}
    {{- if and .enabled .host }}
    chunk_store_config:
      chunk_cache_config:
        memcached:
          batch_size: {{ .batch_size }}
          parallelism: {{ .parallelism }}
        memcached_client:
          host: {{ .host }}
          service: {{ .service }}
    {{- end }}
    {{- end }}

    {{- if .Values.loki.schemaConfig}}
    schema_config:
    {{- toYaml .Values.loki.schemaConfig | nindent 2}}
    {{- else }}
    schema_config:
      configs:
        - from: 2022-01-11
          store: boltdb-shipper
          {{- if eq .Values.loki.storage.type "s3" }}
          object_store: s3
          {{- else if eq .Values.loki.storage.type "gcs" }}
          object_store: gcs
          {{- else }}
          object_store: filesystem
          {{- end }}
          schema: v12
          index:
            prefix: loki_index_
            period: 24h
    {{- end }}

    {{- if or .Values.minio.enabled (eq .Values.loki.storage.type "s3") (eq .Values.loki.storage.type "gcs") }}
    ruler:
      storage:
      {{- include "loki.rulerStorageConfig" . | nindent 4}}
    {{- end -}}

    {{- with .Values.loki.memcached.results_cache }}
    query_range:
      align_queries_with_step: true
      {{- if and .enabled .host }}
      cache_results: {{ .enabled }}
      results_cache:
        cache:
          default_validity: {{ .default_validity }}
          memcached_client:
            host: {{ .host }}
            service: {{ .service }}
            timeout: {{ .timeout }}
      {{- end }}
    {{- end }}

    {{- with .Values.loki.storage_config }}
    storage_config:
      {{- tpl (. | toYaml) $ | nindent 4 }}
    {{- end }}

    {{- with .Values.loki.query_scheduler }}
    query_scheduler:
      {{- tpl (. | toYaml) $ | nindent 4 }}
    {{- end }}

    {{- with .Values.loki.compactor }}
    compactor:
      {{- tpl (. | toYaml) $ | nindent 4 }}
    {{- end }}

    chunk_store_config:
      max_look_back_period: "0s"
    ingester:
      chunk_block_size: 262144
      chunk_idle_period: 30m
      chunk_retain_period: 1m
      lifecycler:
        ring:
          replication_factor: 1
      max_transfer_retries: 0
      wal:
        dir: /var/loki/wal  

    
  # TODO: There might be nothing to do here.
  # memberlist:
  #   abort_if_cluster_join_fails: false
  # join_members:
  # - loki-memberlist
  # - loki-memberlist.logging.svc.cluster.local

singleBinary:
  # -- Number of replicas for the single binary
  replicas: 1
  # -- Resource requests and limits for the single binary
  resources: {}
  # -- Node selector for single binary pods
  nodeSelector: {}
  persistence:
    # -- Size of persistent disk
    size: 500Gi
    # -- Storage class to be used.
    # If defined, storageClassName: <storageClass>.
    # If set to "-", storageClassName: "", which disables dynamic provisioning.
    # If empty or set to null, no storageClassName spec is
    # set, choosing the default provisioner (gp2 on AWS, standard on GKE, AWS, and OpenStack).
    storageClass: "fame-storage-vsan-policy"

This creates this Loki config file:

apiVersion: v1
data:
  config.yaml: |
    auth_enabled: false
    chunk_store_config:
      max_look_back_period: 0s
    common:
      path_prefix: /var/loki
      replication_factor: 3
      storage:
        filesystem:
          chunks_directory: /var/loki/chunks
          rules_directory: /var/loki/rules
    compactor:
      compaction_interval: 10m
      retention_delete_delay: 1h
      retention_delete_worker_count: 100
      retention_enabled: true
      shared_store: filesystem
      working_directory: /var/loki/boltdb-shipper-compactor
    ingester:
      chunk_block_size: 262144
      chunk_idle_period: 30m
      chunk_retain_period: 1m
      lifecycler:
        ring:
          replication_factor: 1
      max_transfer_retries: 0
      wal:
        dir: /var/loki/wal
    limits_config:
      enforce_metric_name: false
      max_cache_freshness_per_query: 10m
      reject_old_samples: true
      reject_old_samples_max_age: 168h
      retention_period: 2d
      split_queries_by_interval: 15m
    memberlist:
      join_members:
      - loki-memberlist
    query_range:
      align_queries_with_step: true
    schema_config:
      configs:
      - from: "2022-01-11"
        index:
          period: 24h
          prefix: loki_index_
        object_store: filesystem
        schema: v12
        store: boltdb-shipper
    server:
      grpc_listen_port: 9095
      http_listen_port: 3100
    storage_config:
      boltdb_shipper:
        active_index_directory: /var/loki/boltdb-shipper-active
        cache_location: /var/loki/boltdb-shipper-cache
        cache_ttl: 24h
        shared_store: filesystem
      filesystem:
        directory: /var/loki/chunks
      hedging:
        at: 250ms
        max_per_second: 20
        up_to: 3
kind: ConfigMap

It looks like this is configured to delete logs after 2 days, (I initially had 3), but the usage on in my persistent volume keeps going up even after a week of running.

Is there something that I am missing in this configuration to get log retention working correctly?

I believe that I have found the issue. It has to do with marking chunks with a read-only file system in the image.

I ran:
kubectl logs loki-0 -n logging

It gave me a number of log message like:
level=warn ts=2023-01-05T20:58:48.828870651Z caller=marker.go:214 msg=“failed to process marks” path=/var/loki/boltdb-shipper-compactor/retention/markers/1672871457891585205 err=“open /tmp/marker-view-2316940776: read-only file system”
level=warn ts=2023-01-05T20:58:48.828882132Z caller=marker.go:214 msg=“failed to process marks” path=/var/loki/boltdb-shipper-compactor/retention/markers/1672872057887901626 err=“open /tmp/marker-view-1660263759: read-only file system”

To fix this problem I added some extra volume configuration to the singleBinary section of my values.yaml file for Helm deployment:

singleBinary:
  # -- Number of replicas for the single binary
  replicas: 1
  # -- Resource requests and limits for the single binary
  resources: {}
  # -- Node selector for single binary pods
  nodeSelector: {}
  persistence:
    # -- Size of persistent disk
    size: 500Gi
    # -- Storage class to be used.
    # If defined, storageClassName: <storageClass>.
    # If set to "-", storageClassName: "", which disables dynamic provisioning.
    # If empty or set to null, no storageClassName spec is
    # set, choosing the default provisioner (gp2 on AWS, standard on GKE, AWS, and OpenStack).
    storageClass: "fame-storage-vsan-policy"

  # -- Volume mounts to add to the single binary pods
  extraVolumeMounts:
  - name: temporary
    mountPath: /tmp 
  # -- Volumes to add to the single binary pods
  extraVolumes: 
  - name: temporary 
    emptyDir: {}

The growth in my persistent volume has stopped, so it looks like this is working.