Hi,
I am running a single binary Loki 3.2.1 on a filesystem TSDB store, where my logs gradually fade out (in a few hours, mostly determined by max_chunk_age
) when they are flushing off the ingester. I can no longer access them through Grafana dashboard, despite no error logged.
I came across this issue which seems to be the exact thing I am experiencing. Is there any workaround currently available, apart from downgrading to 2.8.4?
My config:
target: all,write
auth_enabled: false
server:
http_listen_port: 3100
grpc_listen_port: 9096
log_level: info
grpc_server_max_concurrent_streams: 1000
common:
instance_addr:
path_prefix: /var/lib/loki
instance_interface_names:
- eth0
storage:
filesystem:
chunks_directory: /var/lib/loki/chunks
rules_directory: /var/lib/loki/rules
replication_factor: 1
ring:
kvstore:
store: inmemory
instance_enable_ipv6: true
ingester:
chunk_encoding: zstd
max_chunk_age: 6h
chunk_idle_period: 3h
chunk_target_size: 16777216
ingester_rf1:
enabled: false
schema_config:
configs:
- from: 2020-10-24
store: tsdb
object_store: filesystem
schema: v13
index:
prefix: index_
period: 24h
frontend:
encoding: protobuf
compactor:
working_directory: /var/lib/loki/retention
compaction_interval: 1h
retention_enabled: false
retention_delete_delay: 24h
retention_delete_worker_count: 150
delete_request_store: filesystem
table_manager:
retention_period: 365d
limits_config:
retention_period: 365d
retention_stream: []
max_query_parallelism: 16
discover_log_levels: false
From this image, it can be clearly seen that old logs are gradually gone. This corresponds to the flush event on Loki log:
If I decrease max_chunk_age
, the logs will be gone faster.
Couple of questions for you:
- You are running a single instance of Loki, correct?
- What is the filesystem based on? Docker host mounted volume? EFS? Some other storage solution?
Also, this is likely not related to your problem, but you probably want to disable table manager and just use compactor.
- You are running a single instance of Loki, correct?
Correct.
- What is the filesystem based on? Docker host mounted volume? EFS? Some other storage solution?
Plain ZFS. Assigned by Proxmox VE LXC Container.
This problem occurs before the table manager setting is in place.
I’d say double check and make sure chunks are actually written into your permanent storage (and make sure /var/lib/loki is from outside of container).
If your logs are disappearing after flushing then it’s not written to your chunk storage, or it’s written to the wrong place.
I’d say double check and make sure chunks are actually written into your permanent storage (and make sure /var/lib/loki is from outside of container).
LXC Container is similar to VM. Everything is persistent. Loki is not running in a one-off containers like Docker.
If your logs are disappearing after flushing then it’s not written to your chunk storage, or it’s written to the wrong place.
I could find the index file and the chunk file in the expected place, and the directory size is actually increasing. Unless there is a separated path settings in the querier I’d say it behaves in the expected way.
- What is the latest timestamp for your index and chunk files written onto the filesystem?
- Try changing
target: all,write
to target: all
Does your Loki container produce any error from querier or query-frontend when querying older logs?
After tackling with the config file here and there, followed by a hard crash of the entire VM, the problem seems to be fixed.
It looks like the culprit is the instance_addr
/ instance_interface_names
in the common section, which I shouldn’t touch upon when using inmemory ring store.
Also
Does your Loki container produce any error from querier or query-frontend when querying older logs?
There is no error indicated. There is lots of debug
log about mock.go
Get - deadline exceeded
.