Logs disappear from Loki after some time

mateusznowak · October 20, 2023, 9:28am

Hello everyone!

I have very irritating problem with Loki which I assume must be related to misconfiguration. The issue I struggle with is that after some time (usually it is approximately 30 minutes but sometimes it might take a few hours) logs disappear and only the recent ones are returned by the simple ‘take all’ query.

Below is the screenshot of log count chart taken just after loki and promtail initialization:

Following is taken two hours later:

My configuration seems to be quite simple; I have:

Promtail configured to follow one log file
Loki in monolithic mode configured to save data using filesystem

Both services are deployed using docker-compose.

Configuration files below (some sensitive data ‘obfuscated’):

docker-compose.yml

version: "3"

networks:
  grafana:
    external: true

services:
  promtail:
    image: dvp-docker.tools.finanteq.com/grafana/promtail:2.9.0
    privileged: true
    userns_mode: host
    volumes:
      - /var/log/apps:/var/log/apps
      - /opt/loki/config:/config
    command: -config.file=/config/promtail-config.yml
    networks:
      - grafana

  loki:
    image: dvp-docker.tools.finanteq.com/grafana/loki:2.9.0
    ports:
      - "3100:3100"
    volumes:
      - /opt/loki/config:/config
    command: -config.file=/config/loki-config.yml
    networks:
      - grafana

promtail-config.yml

clients:
  - url: http://loki:3100/loki/api/v1/push

scrape_configs:
- job_name: file_logs
  static_configs:
  - targets:
    - localhost
    labels:
      app: server
      __path__: /var/log/apps/application.log
  pipeline_stages:
  - match:
      selector: '{app="server"}'
      stages:
      - multiline:
          firstline: '^\[\d{4}-\d{2}-\d{2}\s\d{2}:\d{2}:\d{2},\d{3}\+\d{2}:\d{2}\]'
          max_wait_time: 3s
          max_lines: 100000 #single entry can be quite long
      - regex:
          expression: '^\[(?P<timestamp>\d{4}-\d{2}-\d{2}\s\d{2}:\d{2}:\d{2},\d{3}\+\d{2}:\d{2})\]\s(?P<level>[A-Z]{4,5})\s\[serverVersion:\s(?P<serverVersion>\d+\.\d+\.\d+(-SNAPSHOT)?)?\] (?P<message>(?s:.*))$' # shortened
      - labels: # not all labels
          level:
          serverVersion:
      - template:
          source: timestamp
          template: '{{ Replace .Value " " "T" 1}}'
      - template:
          source: timestamp
          template: '{{ Replace .Value "," "." 1}}'
      - timestamp:
          source: timestamp
          format: '2006-01-02T15:04:05.999-07:00'
      - structured_metadata: # not all metadata
          timestamp:
      - output:
          source: message

loki-config.yml

auth_enabled: false

server:
  grpc_server_max_recv_msg_size: 26214400 
  grpc_server_max_send_msg_size: 26214400 

limits_config:
  allow_structured_metadata: true
  max_line_size: 10kB
  max_line_size_truncate: true # for now I'm fine with truncating very big entries

common:
  path_prefix: /loki
  replication_factor: 1
  ring:
    kvstore:
      store: inmemory

schema_config:
  configs:
    - from: 2020-10-24
      store: tsdb
      object_store: filesystem
      schema: v13
      index:
        prefix: index_
        period: 24h

storage_config:
  tsdb_shipper:
    active_index_directory: /loki/tsdb-index
    cache_location: /loki/tsdb-cache
    shared_store: filesystem
  filesystem:
    directory: /loki/data

analytics:
  reporting_enabled: false

I searched through loki logs but there was really nothing that caught my attention. If you need them, I will gladly attach them.

I would be really grateful if anybody could point what might be the reason for this rather strange behaviour

Cheers!

tonyswumac · October 20, 2023, 3:53pm

Don’t see anything obviously wrong, but where is your data volume mount for the Loki container?

mateusznowak · October 23, 2023, 6:33am

Volume is not there yet since I wanted to have clean chunk and index directories whenever I restarted container after some adjustments in config. I planned to add it as soon as everything else works as expected

jangaraj · October 23, 2023, 7:19am

So did you check if container was not restarted?
How “take all” query looks like?

mateusznowak · October 23, 2023, 7:33am

It has not been restarted - below excerpt from docker inspect command:

{
    ...
    "State": {
      "Status": "running",
      "Running": true,
      "Paused": false,
      "Restarting": false,
      "OOMKilled": false,
      "Dead": false,
      "Pid": 133920,
      "ExitCode": 0,
      "Error": "",
      "StartedAt": "2023-10-20T04:38:34.001250406Z",
      "FinishedAt": "0001-01-01T00:00:00Z"
    },
    ...
    "RestartCount": 0
    ...
}

When it comes to query, it is simply: {app="server"}

jangaraj · October 23, 2023, 8:11am

I guess you have limit on that query, e. g. 1000. You should use metric query (e. g. count log lines) and not obtain whole log lines, where anyway you will use only a count in the graph. It will be more effective.

mateusznowak · October 23, 2023, 8:43am

I’m not sure I understand… Are we talking about line limit configured in Grafana?

If yes, how is that related to me not being able to fetch older logs after some time?

jangaraj · October 23, 2023, 9:30am

Correct. There is some auto limit, let’s say X. So blue and red sets contain the same count of lines (X):

That’s hypothesis and only you can prove it. Set high line limit (don’t use auto) and check time periods of these 2 graphs again.

mateusznowak · October 23, 2023, 9:44am

Ok, I just did as you suggested. I could not perform it on the same time period as before because logs from then are not queryable anymore but I chose last 3 hours as time boundaries and below are results:

50 lines limit

5000 lines limit

I believe limit affects only number of log lines returned but not the count graph.

Just to add - the problem persists since logs from before 10:30 were present when I checked them around 11.

jangaraj · October 23, 2023, 9:48am

Please check metric query, e.g:

mateusznowak · October 23, 2023, 9:57am

Hmmm, that is what I received:

jangaraj · October 23, 2023, 10:10am

OK, wrong query, better one:

sum(
  count_over_time(
    {
      app="server"
    }
    [$__interval]
  )
)

mateusznowak · October 23, 2023, 10:21am

Ok, this is what I got:

jangaraj · October 23, 2023, 11:51am

How the same time range looks now?

mateusznowak · October 23, 2023, 11:56am

Some logs already gone

mateusznowak · October 24, 2023, 7:50am

There is another surprising behaviour that I noticed today in the morning - when I executed query spanning whole 20th of October (the day I started the container and the day I posted here), logs were returned:

But when I executed exactly the same query for the second time, they were already gone:

jangaraj · October 24, 2023, 7:54am

Ok, so that looks like a problem on the Loki side.

mateusznowak · October 25, 2023, 5:19am

Do you suggest raising a bug on GitHub?

jangaraj · October 25, 2023, 9:45am

No, you should investigate and the observe problem.
E.g. check logs, observe pattern how logs disapear, time patterns, replicate on different machine,…

bradleypence · November 16, 2023, 9:48pm

I am having the same issue. LMK if you have any additional information since your last post.

Topic		Replies	Views
Loki "forgets" logs Grafana Loki loki	2	840	September 26, 2024
Logs disappearing Grafana Loki	6	4428	June 23, 2023
Logs disappearing after 11 days in storage Grafana Loki retention	3	15	March 6, 2025
Loki log synchronization lost Grafana Loki loki	1	343	June 11, 2024
Loki: no logs after few seconds Grafana Loki loki	3	983	March 5, 2025