How to tell if retention is working


Working on configuring a loki instance. Internal policy on log files is that we store no more than 14 days worth of logs. I have configured this in loki using the instructions from the following page:

First I tried the method using the compactor, then the option using the table manager. Both seemed to work in that I can’t query data older than 14 days, but I am looking to verify that the data past retention is truly deleted as far as loki is concerned. The only reason I have doubt is that our s3 bucket for this shows 774G of space used, but the rough size of the logs is closer to 400G (I’m estimating since we compress most of the logs daily). Does it make sense that the loki index/metadata seems to be roughly the same size as the uncompressed data? Is there a way to examine the actual data loki is storing outside of a logql query?

If you can’t query the logs after 14 days then at the very least the index is gone. There is a chance the chunks are still there if your compactor crashes without persistent volume and the marker file is lost, but unless it’s a frequent occurrence it shouldn’t pile up quickly.

I’d analyze your S3 bucket a bit and find out where the storage discrepancy is. Check the size of the chunk directory, check the size of the index directory, etc. You can also list all files under the chunk directory and get the age of them. Also check and see if your S3 bucket has versioning, and if you have policy to discard incomplete uploads.

Thanks Tony, that makes sense. To be clear I’m not sure there is a discrepancy, I just wasn’t sure if 2:1 was an unusual ratio between loki and text log files. Pretty sure we have relatively low cardinality, we have been very conservative with our use of labels. I assume that too many labels could contribute to index size on disk.

What I meant by discrepancy is that, in your original post, you mentioned that your total S3 bucket size is 774G, while chunk directory is only 400G. That’s more than 300G that’s unaccounted for. Index should be very small, for example our chunk storage is about 1.7TB and index is only 900MB.

This is in general a good practice indeed.

Actually now I’m not sure, per the docs:

Retention through the Table Manager is achieved by relying on the object store TTL feature, and will work for both boltdb-shipper store and chunk/index store. However retention through the Compactor is supported only with the boltdb-shipper store.
Retention | Grafana Loki documentation

So given that, would the following schema_config/storage_config even rely on the loki retention to expire the logs, or am I not reading the docs correctly:

- from: 2020-07-01
store: boltdb-shipper
object_store: aws
schema: v11
prefix: index_
period: 24h

- from: 2023-04-05
  store: tsdb
  object_store: aws
  schema: v12
    prefix: tsdb_index_
    period: 24h

active_index_directory: /var/lib/loki/boltdb-shipper-active
cache_location: /var/lib/loki/boltdb-shipper-cache
cache_ttl: 120h
shared_store: s3

active_index_directory: /var/lib/loki/tsdb-index
cache_location: /var/lib/loki/tsdb-cache
query_ready_num_days: 14
shared_store: s3

endpoint: REMOVED
access_key_id: REMOVED
secret_access_key: REMOVED
s3forcepathstyle: true