Logs are gone after flushing off ingester

supreme0881 · October 31, 2024, 9:43pm

Hi,

I am running a single binary Loki 3.2.1 on a filesystem TSDB store, where my logs gradually fade out (in a few hours, mostly determined by max_chunk_age) when they are flushing off the ingester. I can no longer access them through Grafana dashboard, despite no error logged.

I came across this issue which seems to be the exact thing I am experiencing. Is there any workaround currently available, apart from downgrading to 2.8.4?

My config:

target: all,write
auth_enabled: false

server:
  http_listen_port: 3100
  grpc_listen_port: 9096
  log_level: info
  grpc_server_max_concurrent_streams: 1000

common:
  instance_addr: 
  path_prefix: /var/lib/loki
  instance_interface_names:
    - eth0
  storage:
    filesystem:
      chunks_directory: /var/lib/loki/chunks
      rules_directory: /var/lib/loki/rules
  replication_factor: 1
  ring:
    kvstore:
      store: inmemory
    instance_enable_ipv6: true

ingester: 
  chunk_encoding: zstd
  max_chunk_age: 6h
  chunk_idle_period: 3h
  chunk_target_size: 16777216

ingester_rf1:
  enabled: false

schema_config:
  configs:
    - from: 2020-10-24
      store: tsdb
      object_store: filesystem
      schema: v13
      index:
        prefix: index_
        period: 24h

frontend:
  encoding: protobuf

compactor:
  working_directory: /var/lib/loki/retention
  compaction_interval: 1h
  retention_enabled: false
  retention_delete_delay: 24h
  retention_delete_worker_count: 150  
  delete_request_store: filesystem

table_manager:
  retention_period: 365d

limits_config:
  retention_period: 365d
  retention_stream: []
  max_query_parallelism: 16
  discover_log_levels: false

From this image, it can be clearly seen that old logs are gradually gone. This corresponds to the flush event on Loki log:

If I decrease max_chunk_age, the logs will be gone faster.

tonyswumac · October 31, 2024, 10:05pm

Couple of questions for you:

You are running a single instance of Loki, correct?
What is the filesystem based on? Docker host mounted volume? EFS? Some other storage solution?

Also, this is likely not related to your problem, but you probably want to disable table manager and just use compactor.

supreme0881 · October 31, 2024, 10:18pm

You are running a single instance of Loki, correct?

Correct.

What is the filesystem based on? Docker host mounted volume? EFS? Some other storage solution?

Plain ZFS. Assigned by Proxmox VE LXC Container.

This problem occurs before the table manager setting is in place.

tonyswumac · November 1, 2024, 6:42pm

I’d say double check and make sure chunks are actually written into your permanent storage (and make sure /var/lib/loki is from outside of container).

If your logs are disappearing after flushing then it’s not written to your chunk storage, or it’s written to the wrong place.

supreme0881 · November 1, 2024, 6:54pm

I’d say double check and make sure chunks are actually written into your permanent storage (and make sure /var/lib/loki is from outside of container).

LXC Container is similar to VM. Everything is persistent. Loki is not running in a one-off containers like Docker.

If your logs are disappearing after flushing then it’s not written to your chunk storage, or it’s written to the wrong place.

I could find the index file and the chunk file in the expected place, and the directory size is actually increasing. Unless there is a separated path settings in the querier I’d say it behaves in the expected way.

tonyswumac · November 1, 2024, 11:17pm

What is the latest timestamp for your index and chunk files written onto the filesystem?
Try changing target: all,write to target: all

Does your Loki container produce any error from querier or query-frontend when querying older logs?

supreme0881 · November 2, 2024, 4:03am

After tackling with the config file here and there, followed by a hard crash of the entire VM, the problem seems to be fixed.

It looks like the culprit is the instance_addr / instance_interface_names in the common section, which I shouldn’t touch upon when using inmemory ring store.

Also

Does your Loki container produce any error from querier or query-frontend when querying older logs?

There is no error indicated. There is lots of debug log about mock.go Get - deadline exceeded.

Topic		Replies	Views
Logs deletion problem Grafana Loki	9	192	January 21, 2025
Loki chunks and index not keeping forever Grafana Loki	8	65	June 2, 2025
Logs disappearing after 11 days in storage Grafana Loki retention	13	140	June 17, 2025
Flushing stream Grafana Loki	3	1033	April 19, 2025
Logs disappearing Grafana Loki	6	4577	June 23, 2023

Logs are gone after flushing off ingester

Related topics