Logs Disappearing Too Soon: Loki Retention Settings on On-Prem Cluster

ifrit_konv · October 30, 2024, 8:36am

Hello,

I’m trying to configure Loki and the compactor on my on-premise cluster. However, I’m facing an issue where logs are being deleted after only a few hours, despite setting the retention period to 5 days. I can’t seem to understand why logs aren’t staying beyond 24 hours.

Below is the ConfigMap YAML configuration for Loki. Could someone kindly help me identify the cause?

apiVersion: v1
data:
  config.yaml: |2

    auth_enabled: false
    bloom_build:
      builder:
        planner_address: loki-backend-headless.konv-monitor.svc.cluster.local:9095
      enabled: false
    bloom_gateway:
      client:
        addresses: dnssrvnoa+_grpc._tcp.loki-backend-headless.konv-monitor.svc.cluster.local
      enabled: false
    chunk_store_config:
      chunk_cache_config:
        background:
          writeback_buffer: 500000
          writeback_goroutines: 1
          writeback_size_limit: 500MB
        default_validity: 0s
        memcached:
          batch_size: 4
          parallelism: 5
        memcached_client:
          addresses: dnssrvnoa+_memcached-client._tcp.kl-loki-chunks-cache.konv-monitor.svc
          consistent_hash: true
          max_idle_conns: 72
          timeout: 2000ms
    common:
      compactor_address: 'http://loki-backend:3100'
      path_prefix: /var/loki
      replication_factor: 3
      storage:
        s3:
          access_key_id: loki
          bucketnames: chunks
          endpoint: http://loki:lokiloki@192.168.81.179:9000/loki
          insecure: false
          s3forcepathstyle: true
          secret_access_key: lokiloki
    compactor:
      compaction_interval: 10m
      delete_request_store: aws
      retention_delete_delay: 24h
      retention_delete_worker_count: 150
      retention_enabled: true
      working_directory: /var/loki/retention
    frontend:
      scheduler_address: ""
      tail_proxy_url: ""
    frontend_worker:
      scheduler_address: ""
    index_gateway:
      mode: simple
    limits_config:
      max_cache_freshness_per_query: 10m
      max_query_length: 2161h
      max_query_lookback: 720h
      query_timeout: 300s
      reject_old_samples: true
      reject_old_samples_max_age: 480h
      retention_period: 5d
      split_queries_by_interval: 15m
      volume_enabled: true
    memberlist:
      join_members:
      - loki-memberlist
    pattern_ingester:
      enabled: false
    query_range:
      align_queries_with_step: true
      cache_results: true
      results_cache:
        cache:
          background:
            writeback_buffer: 500000
            writeback_goroutines: 1
            writeback_size_limit: 500MB
          default_validity: 12h
          memcached_client:
            addresses: dnssrvnoa+_memcached-client._tcp.kl-loki-results-cache.konv-monitor.svc
            consistent_hash: true
            timeout: 500ms
            update_interval: 1m
    ruler:
      storage:
        s3:
          access_key_id: loki
          bucketnames: ruler
          endpoint: http://loki:lokiloki@192.168.81.179:9000/loki
          insecure: false
          s3forcepathstyle: true
          secret_access_key: lokiloki
        type: s3
    runtime_config:
      file: /etc/loki/runtime-config/runtime-config.yaml
    schema_config:
      configs:
      - from: "2020-07-01"
        index:
          period: 24h
          prefix: index_
        object_store: aws
        schema: v13
        store: tsdb
    server:
      grpc_listen_port: 9095
      http_listen_port: 3100
      http_server_read_timeout: 600s
      http_server_write_timeout: 600s
    storage_config:
      bloom_shipper:
        working_directory: /var/loki/data/bloomshipper
      boltdb_shipper:
        index_gateway_client:
          server_address: dns+loki-backend-headless.konv-monitor.svc.cluster.local:9095
      filesystem: {}
      hedging:
        at: 250ms
        max_per_second: 20
        up_to: 3
      tsdb_shipper:
        active_index_directory: /var/loki/tsdb/index
        cache_location: /var/loki/tsdb/index_cache
        cache_ttl: 1m
        index_gateway_client:
          server_address: dns+loki-backend-headless.konv-monitor.svc.cluster.local:9095
    tracing:
      enabled: false
kind: ConfigMap

tonyswumac · October 31, 2024, 2:28pm

Can you confirm the index and chunk files are actually written to your backend storage (looks like S3)?

Couple of other things to potentially look into:

I see object_store: aws in your schema, but I don’t see aws defined anywhere.
What do you have in your /etc/loki/runtime-config/runtime-config.yaml?

agiovacchini · October 31, 2024, 2:54pm

Hi our runtime-config.yaml is empty

apiVersion: v1
data:
  runtime-config.yaml: |
    {}
kind: ConfigMap
metadata:
  annotations:
    meta.helm.sh/release-name: kl
    meta.helm.sh/release-namespace: konv-monitor
  creationTimestamp: "2024-10-25T15:10:16Z"
  labels:
    app.kubernetes.io/instance: kl
    app.kubernetes.io/managed-by: Helm
    app.kubernetes.io/name: loki
    app.kubernetes.io/version: 3.2.0
    helm.sh/chart: loki-6.18.0
  name: loki-runtime
  namespace: konv-monitor
  resourceVersion: "31713451"
  uid: d543941c-2ad0-454a-a104-c77693696b70

but we have a /etc/loki/local-config.yaml that has this content:

auth_enabled: false

server:
  http_listen_port: 3100

common:
  instance_addr: 127.0.0.1
  path_prefix: /loki
  storage:
    filesystem:
      chunks_directory: /loki/chunks
      rules_directory: /loki/rules
  replication_factor: 1
  ring:
    kvstore:
      store: inmemory

schema_config:
  configs:
    - from: 2020-10-24
      store: tsdb
      object_store: filesystem
      schema: v13
      index:
        prefix: index_
        period: 24h

ruler:
  alertmanager_url: http://localhost:9093

# By default, Loki will send anonymous, but uniquely-identifiable usage and configuration
# analytics to Grafana Labs. These statistics are sent to https://stats.grafana.org/
#
# Statistics help us better understand how Loki is used, and they show us performance
# levels for most users. This helps us prioritize features and documentation.
# For more information on what's sent, look at
# https://github.com/grafana/loki/blob/main/pkg/usagestats/stats.go
# Refer to the buildReport method to see what goes into a report.
#
# If you would like to disable reporting, uncomment the following lines:
#analytics:
#  reporting_enabled: false

ifrit_konv · October 31, 2024, 5:00pm

First of all, thank you for letting us know that we hadn’t set up the correct storage_config.

I attempted to reconfigure it like this, but it still cannot locate the data on S3:

    schemaConfig:
      configs:
        - from: 2020-07-01
          store: tsdb
          object_store: aws
          schema: v13
          index:
            prefix: index_
            period: 24h
    storage:
      bucketNames:
        chunks: chunks
        ruler: ruler
        admin: loki
      type: s3
      s3:
        endpoint: http://loki:lokiloki@192.168.81.179:9000/loki
        accessKeyId: loki
        s3ForcePathStyle: true
        secretAccessKey: lokiloki


    storage_config:
      aws:
        s3: http://loki:lokiloki@192.168.81.179:9000/chunks
        s3forcepathstyle: true

Could you perhaps guide us on the exact way to set up the storage_config? We tried with the endpoint /loki, and also without it, but we’re still having issues.

tonyswumac · October 31, 2024, 9:56pm

I think that still doesn’t look quite right. You can find some examples here: Single Store TSDB (tsdb) | Grafana Loki documentation

This is our index and storage configuration:

common:
  storage:
    s3:
      bucketnames: < loki_storage_config_aws_bucketname >
      region: < aws_region >
      s3forcepathstyle: true
      sse:
        type: "SSE-S3"

schema_config:
  configs:
  - from: 2024-10-02
    store: tsdb
    object_store: s3
    schema: v13
    index:
      prefix: index_
      period: 24h

Topic		Replies	Views
Log retention configuration for loki is not working Grafana Loki loki	7	4072	December 19, 2023
Loki does not delete logs after retention delete delay Grafana Loki loki	7	3639	February 7, 2025
Loki logs not appearing even if retention_preiod is set to 168h Grafana Loki	3	91	November 7, 2024
Logs disappearing after 11 days in storage Grafana Loki retention	13	115	June 17, 2025
Loki not deleting logs from s3 Grafana Loki	7	5494	March 1, 2023

Logs Disappearing Too Soon: Loki Retention Settings on On-Prem Cluster

Related topics