LOKI Retention Policy and Chunk Deletion Issue

Hey everyone

We’re running into an issue with our on-prem Loki setup (not on Kubernetes) and wanted to see if anyone else has faced something similar or has ideas on how to fix it.


Our Setup

  • Cluster: 2-node Loki setup (Ingestor / Distributor / Querier roles)

  • Promtail: Managed by our DevOps team (runs as a DaemonSet — we don’t have admin access, so a bit of dependency there)

  • Storage: Everything is on local filesystem mounts:

    /data/loki/wal  
    /data/loki/chunks  
    /data/loki/index
    
    
  • Retention: Configured in loki-config.yaml for 5 days

    • We’ve tuned parameters like chunk block size (2 MB) to optimize space usage
  • Deletion API: We’ve also tried using the curl deletion API, but understand it only marks data for deletion in the index — it doesn’t immediately remove chunk files.


What We’re Observing

  1. Chunk file timestamps keep changing
    Even old chunk files under /data/loki/chunks keep getting their “modified date” updated.
    When we check with:

    ls -ltrh /data/loki/chunks
    
    

    we can see the modification dates changing regularly, even for files that should be past retention.

  2. Retention logic not kicking in
    We wrote a small custom cleanup script to delete chunks older than 5 days, but since the modified date keeps updating, nothing ever qualifies for deletion.

  3. Disk usage keeps going up
    Despite retention being set to 5 days (and even after testing the deletion API), the /data/loki/chunks folder size keeps increasing.
    It doesn’t seem like the old data is actually being removed.


Questions

  • Why would Loki be updating chunk file modification timestamps?

  • How can we make sure chunks are actually deleted after retention (and not just marked)?

  • Is there a better approach to enforce retention-based cleanup when running Loki on filesystem storage — not in Kubernetes and not using object storage?

  • Could something like WAL replay or a background process (e.g. compactor) be touching those files and resetting timestamps?

    loki version 2.9.3
    loki yaml:
    ingester:
    lifecycler:
    ring:
    kvstore:
    store: memberlist
    replication_factor: 1
    final_sleep: 0s
    chunk_idle_period: 5m
    chunk_retain_period: 30s
    max_transfer_retries: 0
    chunk_block_size: 2097152
    chunk_encoding: snappy
    max_chunk_age: 2h

    distributor:
    ring:
    kvstore:
    store: memberlist

    querier:
    engine:
    timeout: 1m
    query_ingesters_within: 2h

    query_range:
    align_queries_with_step: true
    max_retries: 5
    parallelise_shardable_queries: true

    frontend:
    log_queries_longer_than: 5s
    compress_responses: true

    frontend_worker:
    frontend_address: X.X.X.X

    schema_config:
    configs:
    - from: 2020-10-24
    store: boltdb-shipper
    object_store: filesystem
    schema: v11
    index:
    prefix: index_
    period: 24h

    storage_config:
    boltdb_shipper:
    active_index_directory: /data/loki/index
    cache_location: /data/loki/index_cache
    shared_store: filesystem

    filesystem:
    directory: /data/loki/chunks

    compactor:
    working_directory: /data/loki/compactor
    shared_store: filesystem
    retention_enabled: true
    delete_request_cancel_period: 1m
    deletion_mode: filter-and-delete

    limits_config:
    ingestion_burst_size_mb: 500
    ingestion_rate_mb: 1024
    ingestion_rate_strategy: global
    max_cache_freshness_per_query: 10m
    max_global_streams_per_user: 50000
    max_query_parallelism: 128
    per_stream_rate_limit: 500MB
    per_stream_rate_limit_burst: 500MB
    reject_old_samples: true
    reject_old_samples_max_age: 120h
    retention_period: 5d
    split_queries_by_interval: 15m
    enforce_metric_name: false
    max_entries_limit_per_query: 80000
    allow_deletes: true

    chunk_store_config:
    max_look_back_period: 0s

    table_manager:
    retention_deletes_enabled: true
    retention_period: 5d

    curl api:
    curl -X DELETE “http://<loki_host>:3100/loki/api/v1/delete”
    -H “X-Scope-OrgID: <tenant_id>”
    –data-urlencode ‘query={job=“varlogs”} |~ “sensitive_data”’
    –data-urlencode ‘start=<start_timestamp>’
    –data-urlencode ‘end=<end_timestamp>’

    If anyone has seen this behavior or has tips on how to manage chunk cleanup and ensure retention works effectively in a non-K8s setup, please share your thoughts.
    Thanks in advance!

Try disabling table manager:

Also, you can’t run a loki cluster on local file system, so I suspect you are already not getting all your logs.