How Grafana Loki retention works

I would like to set up retention for logs stored in Grafana Loki and I am not sure how the retention really works. There is more period time to configure and from documentation, it is not so clear to understand how whole process of retention works.

This is my Loki configuration:

common:
  instance_interface_names:
    - "lo"
  path_prefix: /loki
  storage:
    filesystem:
      chunks_directory: /loki/chunks
      rules_directory: /loki/rules
  replication_factor: 1
  ring:
    instance_interface_names:
      - "lo"
    kvstore:
      store: inmemory

compactor:
  working_directory: /loki/compactor
  compaction_interval: 5m
  retention_enabled: true
  retention_delete_delay: 5m
  delete_request_cancel_period: 1m

limits_config:
  retention_period: 10m

schema_config:
  configs:
    - from: 2020-10-24
      store: boltdb-shipper
      object_store: filesystem
      schema: v11
      index:
        prefix: index_
        period: 24h
      chunks:
        prefix: chunk_
        period: 168h

Compactor should remove old data based on this configuration. This is how I understand config file:

  • compactor.compaction_interval - every 5 minutes is run compactor process
  • compactor.retention_enabled - enable to apply retention policy
  • compactor.retention_delete_delay - this is the time Compactor is waiting to delete chunks after retention_period exceed.
  • compactor. delete_request_cancel_period - allow cancellation of delete request made by API. I think, not used in my case at all.
  • limits_config. retention_period - this is time after which data should be deleted. According to doc - The minimum retention period is 24h.

According to the configuration, I would expect to have logs for last 15 minutes:

  • 0:00 - first log arrives
  • 0:05 - compactor is running and because of retention 10 min, no logs are meant to be deleted
  • 0:10 - compactor is running again and because of retention 10 min, my first log should be assign for deletion. Because of retention delay 5min, it is still not deleted at this moment.
  • 0:15 - compactor is running for third time. From the second run, there has been log to be deleted and retention delay 5 min has expired, that means this log can be deleted now.

This is just an example of ideal time schedule. I understand there is async processes and timing is not so precise so lets say, according to my logic, this could be done in 30-40 minutes.

I am running this configuration and I can see logs for last 2,5 hours:


First log at 20:43 and last one ad 23:00. This is quite long according to my estimation. I do not want to use it for this short period of time but how can I estimate log retention for example for 3 days, 1 week or 2 months?

Questions:

  1. How is retention calculated and processed?
  2. Do I understand Loki configuration parameters properly?
  3. How does retention differs from chunk.period and index.period under schema_config.config?
1 Like

I am interested in knowing how retention exactly works as well, hopefully someone from loki team can shed some light on it.

If you look at the /loki/chunks directory, you will be surprised to find that the log files are not deleted, causing the disk space to become full. I’m also interested in how this retention works.

1 Like

Can someone please answer these questions?
I am also very interested how to set retention and what exactly mean each of the parameters.

Why there is no answer to this question? The documentation is ugly, as usual with Grafana, plenty of parameters and no example:

# Tables older than this retention period are deleted. Must be either 0
# (disabled) or a multiple of 24h. When enabled, be aware this setting is
# destructive to data!
# CLI flag: -table-manager.retention-period
[retention_period: <duration> | default = 0s]

Thank you. How about the duration, is it in seconds, minutes, hours, days… F****

How does it work really, who knows?

Don’t know about seconds. But I tried to set 1h, 4h, 8h - nothing changed. Now I am trying 24h period. I’ll tell you if something changed tomorrow.

From my experience, this is how retention works:

Compactor runs in loops every compaction_interval, from docs:

The Compactor loops to apply compaction and retention at every compaction_interval, or as soon as possible if running behind.

It will mark chunks that need to be deleted - this will not delete the chunks!

Retention period - configures when the data will be deleted since compactor marked chunks for deletion. From dcos:

  • Note that retention is only available if the index period is 24h.
  • The minimum retention period is 24h.

This means, retention must be longer than 24h. From my experience, it works somehow even on shorter periods, but this is not Loki is designed for.

Response on my questions from this Topic:

1. How is retention calculated and processed?
Compaction runs in a loop (every compaction_interval) or as soon as possible if running behind). Retention also runs in loops and check if chunks are older than retention_period. Keep in mind, all of above runs in loops, here is an example of compacting (I configured compaction_interval to 10m):

2. Do I understand Loki configuration parameters properly?
From my experience, my understanding of configuration has been right:

  • compactor.compaction_interval - interval between compactor loops
  • compactor.retention_enabled - enable/disable to apply retention policy
  • compactor.retention_delete_delay - this is the time Compactor is waiting to delete chunks after retention_period exceed.
  • compactor. delete_request_cancel_period - delete request may be canceled within a delete_request_cancel_period. In my opinion, this make sense when you use API.
  • limits_config.retention_period - this is time after which data should be deleted. Minimum is 24h according to docs.

3. How does retention differs from chunk.period and index.period under schema_config.config?
Currently, I am not able to find this configurations in documentation.

I hope this helped. If yes, I would mark it as Solution.

From comment I would expect only hours is supported with integer multiplication. This should be enough, I cannot imagine requirements of retention in seconds or minutes. I can see default value is in seconds default = 0s but I would ignore it. Maybe you can try to set duration in hours between 1 and 2 days, e.g. 25h or 32h.

Valid values: 24h, 48h, …, 336h (this is my retention). You can try e.g. 36h and see, if it works.

Well I can show how we finally manage the point.

# 2023-02-09 RC
auth_enabled: false

server:
  http_listen_port: 3100
  grpc_listen_port: 9096

common:
  path_prefix: /tmp/loki
  storage:
    filesystem:
      chunks_directory: /tmp/loki/chunks
      rules_directory: /tmp/loki/rules
  replication_factor: 1
  ring:
    instance_addr: 127.0.0.1
    kvstore:
      store: inmemory

schema_config:
  configs:
  - from: 2023-02-08
    store: boltdb-shipper
    object_store: filesystem
    schema: v11
    index:
      prefix: index_
      period: 24h


# more info here https://ourserver.com/jira/browse/MONI-24
compactor:
  # working_directory is the directory where marked chunks and temporary tables will be saved.
  working_directory: /tmp/loki/compactor
  compaction_interval: 10m
  retention_enabled: true
  # retention_delete_delay is the delay after which the Compactor will delete marked chunks.
  retention_delete_delay: 15m
  # retention_delete_worker_count specifies the maximum quantity of goroutine workers instantiated to delete chunks
  retention_delete_worker_count: 150


table_manager:
  retention_deletes_enabled: true
  retention_period: 15d

ruler:
  alertmanager_url: http://localhost:9093 

querier:
  max_concurrent: 2048

frontend:
  max_outstanding_per_tenant: 2048
  compress_responses: true

hey~did u get the answer about deleting chunk files in ‘/loki/chunks’ ?