We switched our Loki instance from one to three replicas, so we are less likely to lose data. It is using S3 with boltdb-shipper. Generally this works, but the data is missing until the last 15 minutes (ie the data appears in 15 minutes chunks at xx:00, xx:15, xx:30 and xx:45).
I failed to figure out from the docs what the issue of this problem is. We are not using the full microservice approach, because it is not necessary at our current load.
Here is a graph that shows that there are no logs for the last 15 minutes. A few minutes later it would show all the data until 17:15.
This is our current config. We are using the Helm template with 3 replicas:
auth_enabled: false table_manager: retention_deletes_enabled: true retention_period: 168h server: http_listen_port: 3100 memberlist: abort_if_cluster_join_fails: false bind_port: 7946 join_members: - loki-headless.default.svc.cluster.local:7946 max_join_backoff: 1m max_join_retries: 10 min_join_backoff: 1s distributor: ring: kvstore: store: memberlist ingester: lifecycler: address: 127.0.0.1 ring: replication_factor: 1 kvstore: store: memberlist final_sleep: 0s chunk_idle_period: 5m chunk_retain_period: 30s schema_config: configs: - from: 2021-11-17 store: boltdb-shipper object_store: aws schema: v11 index: prefix: index_ period: 24h storage_config: aws: bucketnames: loki-xxxxxxxx region: eu-west-1 boltdb_shipper: active_index_directory: /data/loki/boltdb-shipper-active cache_location: /data/loki/boltdb-shipper-cache cache_ttl: 24h shared_store: s3 index_queries_cache_config: redis: endpoint: loki-default-keydb.default.svc.cluster.local:6379 limits_config: enforce_metric_name: false reject_old_samples: true reject_old_samples_max_age: 168h