We switched our Loki instance from one to three replicas, so we are less likely to lose data. It is using S3 with boltdb-shipper. Generally this works, but the data is missing until the last 15 minutes (ie the data appears in 15 minutes chunks at xx:00, xx:15, xx:30 and xx:45).
I failed to figure out from the docs what the issue of this problem is. We are not using the full microservice approach, because it is not necessary at our current load.
Here is a graph that shows that there are no logs for the last 15 minutes. A few minutes later it would show all the data until 17:15.
This is our current config. We are using the Helm template with 3 replicas:
auth_enabled: false
table_manager:
retention_deletes_enabled: true
retention_period: 168h
server:
http_listen_port: 3100
memberlist:
abort_if_cluster_join_fails: false
bind_port: 7946
join_members:
- loki-headless.default.svc.cluster.local:7946
max_join_backoff: 1m
max_join_retries: 10
min_join_backoff: 1s
distributor:
ring:
kvstore:
store: memberlist
ingester:
lifecycler:
address: 127.0.0.1
ring:
replication_factor: 1
kvstore:
store: memberlist
final_sleep: 0s
chunk_idle_period: 5m
chunk_retain_period: 30s
schema_config:
configs:
- from: 2021-11-17
store: boltdb-shipper
object_store: aws
schema: v11
index:
prefix: index_
period: 24h
storage_config:
aws:
bucketnames: loki-xxxxxxxx
region: eu-west-1
boltdb_shipper:
active_index_directory: /data/loki/boltdb-shipper-active
cache_location: /data/loki/boltdb-shipper-cache
cache_ttl: 24h
shared_store: s3
index_queries_cache_config:
redis:
endpoint: loki-default-keydb.default.svc.cluster.local:6379
limits_config:
enforce_metric_name: false
reject_old_samples: true
reject_old_samples_max_age: 168h