LOKI Retention Policy and Chunk Deletion Issue

Hey everyone

We’re running into an issue with our on-prem Loki setup (not on Kubernetes) and wanted to see if anyone else has faced something similar or has ideas on how to fix it.


Our Setup

  • Cluster: 2-node Loki setup (Ingestor / Distributor / Querier roles)

  • Promtail: Managed by our DevOps team (runs as a DaemonSet — we don’t have admin access, so a bit of dependency there)

  • Storage: Everything is on local filesystem mounts:

    /data/loki/wal  
    /data/loki/chunks  
    /data/loki/index
    
    
  • Retention: Configured in loki-config.yaml for 5 days

    • We’ve tuned parameters like chunk block size (2 MB) to optimize space usage
  • Deletion API: We’ve also tried using the curl deletion API, but understand it only marks data for deletion in the index — it doesn’t immediately remove chunk files.


What We’re Observing

  1. Chunk file timestamps keep changing
    Even old chunk files under /data/loki/chunks keep getting their “modified date” updated.
    When we check with:

    ls -ltrh /data/loki/chunks
    
    

    we can see the modification dates changing regularly, even for files that should be past retention.

  2. Retention logic not kicking in
    We wrote a small custom cleanup script to delete chunks older than 5 days, but since the modified date keeps updating, nothing ever qualifies for deletion.

  3. Disk usage keeps going up
    Despite retention being set to 5 days (and even after testing the deletion API), the /data/loki/chunks folder size keeps increasing.
    It doesn’t seem like the old data is actually being removed.


Questions

  • Why would Loki be updating chunk file modification timestamps?

  • How can we make sure chunks are actually deleted after retention (and not just marked)?

  • Is there a better approach to enforce retention-based cleanup when running Loki on filesystem storage — not in Kubernetes and not using object storage?

  • Could something like WAL replay or a background process (e.g. compactor) be touching those files and resetting timestamps?

    loki version 2.9.3
    loki yaml:
    ingester:
    lifecycler:
    ring:
    kvstore:
    store: memberlist
    replication_factor: 1
    final_sleep: 0s
    chunk_idle_period: 5m
    chunk_retain_period: 30s
    max_transfer_retries: 0
    chunk_block_size: 2097152
    chunk_encoding: snappy
    max_chunk_age: 2h

    distributor:
    ring:
    kvstore:
    store: memberlist

    querier:
    engine:
    timeout: 1m
    query_ingesters_within: 2h

    query_range:
    align_queries_with_step: true
    max_retries: 5
    parallelise_shardable_queries: true

    frontend:
    log_queries_longer_than: 5s
    compress_responses: true

    frontend_worker:
    frontend_address: X.X.X.X

    schema_config:
    configs:
    - from: 2020-10-24
    store: boltdb-shipper
    object_store: filesystem
    schema: v11
    index:
    prefix: index_
    period: 24h

    storage_config:
    boltdb_shipper:
    active_index_directory: /data/loki/index
    cache_location: /data/loki/index_cache
    shared_store: filesystem

    filesystem:
    directory: /data/loki/chunks

    compactor:
    working_directory: /data/loki/compactor
    shared_store: filesystem
    retention_enabled: true
    delete_request_cancel_period: 1m
    deletion_mode: filter-and-delete

    limits_config:
    ingestion_burst_size_mb: 500
    ingestion_rate_mb: 1024
    ingestion_rate_strategy: global
    max_cache_freshness_per_query: 10m
    max_global_streams_per_user: 50000
    max_query_parallelism: 128
    per_stream_rate_limit: 500MB
    per_stream_rate_limit_burst: 500MB
    reject_old_samples: true
    reject_old_samples_max_age: 120h
    retention_period: 5d
    split_queries_by_interval: 15m
    enforce_metric_name: false
    max_entries_limit_per_query: 80000
    allow_deletes: true

    chunk_store_config:
    max_look_back_period: 0s

    table_manager:
    retention_deletes_enabled: true
    retention_period: 5d

    curl api:
    curl -X DELETE “http://<loki_host>:3100/loki/api/v1/delete”
    -H “X-Scope-OrgID: <tenant_id>”
    –data-urlencode ‘query={job=“varlogs”} |~ “sensitive_data”’
    –data-urlencode ‘start=<start_timestamp>’
    –data-urlencode ‘end=<end_timestamp>’

    If anyone has seen this behavior or has tips on how to manage chunk cleanup and ensure retention works effectively in a non-K8s setup, please share your thoughts.
    Thanks in advance!

Try disabling table manager:

Also, you can’t run a loki cluster on local file system, so I suspect you are already not getting all your logs.

Thanks for your response!

You mentioned that I might already not be getting all my logs — yes, that’s true. I’m definitely seeing a drop in logs.

I wanted to ask something related: I have Promtail running in production, and it’s configured to send logs to two Loki instances:

  1. One Loki is running on Kubernetes and uses MinIO as the object store.

  2. The other Loki is running as a standalone service, storing data directly on the filesystem.

The problem is that I’m seeing log drops only in the filesystem-based Loki setup, while the MinIO-based one works fine.

Do you have any idea why this might be happening? Could it be related to how Loki handles writes on the filesystem (like WAL or chunk storage performance)? Or maybe something on the Promtail side, like queue or batch limits?

Any insights would be really helpful.

Thanks again for your time and help!