Promtail scrapes the right logs, but loki doesn't show them

I have tried a lot of fixes, but nothing seems to work. So I have a syslog-ng server that saves files locally on my machine. (The path for the logs are in /var/log/remote/{HOST}/{date}.log).

I have a lot of VMs so this is pretty populated. I have the promtail config set up this way:

server:
  http_listen_port: 9080
  grpc_listen_port: 0
  log_level: debug

positions:
  filename: /tmp/positions.yaml

clients:
  - url: http://loki:3100/loki/api/v1/push

scrape_configs:
- job_name: local
  static_configs:
  - targets:
      - localhost
    labels:
      job: remotelogs
      __path__: /var/log/remote/**/**/*.log
  pipeline_stages:
  - match:
     selector: '{job="remotelogs"}'
     stages:
      - regex:
         expression: '^(?P<timestamp>\w{3} \d{2} \d{2}:\d{2}:\d{2})'
      - timestamp:
         source: timestamp
         format: "Jan 02 15:04:05"
         location: Europe/Bucharest
      - drop:
         expression: 'MSWinEventLog'

It scrapes every .log file from those folders in order to send them to loki and it also catches the timestamps so I can actually sort the logs by the real time and not be when the file was modified.

The problem here is that not all of the logs are coming into grafana when I am trying to match the timestamp. Some are, but some aren’t it seems to be random. If I remove the timestamp match, all the logs are coming into grafana with no problems, but they are not sorted by date, which makes them random and messy. Also the logs that are coming through are weird, when I get them the time is matched, but they have a chance to disappear for no reason a few hours later.

The most frustrating part is that promtail sees they files just fine, I checked the positions.yaml, all the files are there, the logs also look clean, no problems reading from the files. The logs even show the correct timestamp, so the regex is working fine.

I have tried a million regex combinations, searching online for answers, removing the positions.yaml file. But I am stuck here for the past days. I even tried Copilot and ChatGpt and nothing really came out of it.

I checked the loki logs and I got on a lot of logs (however, not on all the logs that aren’t showing up the error: oldest acceptable timestamp is: 2024-11-12T13:44:12Z’. Which also doesn’t really make sense because the logs that aren’t showing up are from Nov 13 for example. I have tried to set a maximum retention period, but it seems to be ignored.

I have no idea what to do next to be honest, I am really stuck on this one. Bellow are the config files for the loki and also the docker compose file:

Loki:

auth_enabled: false

server:
  http_listen_port: 3100
  grpc_listen_port: 9096

common:
  instance_addr: 127.0.0.1
  path_prefix: /tmp/loki
  storage:
    filesystem:
      chunks_directory: /tmp/loki/chunks
      rules_directory: /tmp/loki/rules
  replication_factor: 1
  ring:
    kvstore:
      store: inmemory

query_range:
  results_cache:
    cache:
      embedded_cache:
        enabled: true
        max_size_mb: 100

schema_config:
  configs:
    - from: 2020-10-24
      store: tsdb
      object_store: filesystem
      schema: v13
      index:
        prefix: index_
        period: 24h
ruler:
  alertmanager_url: http://localhost:9093

limits_config:
  retention_period: 8750h  # Adjust this value as needed

# By default, Loki will send anonymous, but uniquely-identifiable usage and configuration
# analytics to Grafana Labs. These statistics are sent to https://stats.grafana.org/
#
# Statistics help us better understand how Loki is used, and they show us performance
# levels for most users. This helps us prioritize features and documentation.
# For more information on what's sent, look at
# https://github.com/grafana/loki/blob/main/pkg/analytics/stats.go
# Refer to the buildReport method to see what goes into a report.
#
# If you would like to disable reporting, uncomment the following lines:
#analytics:
#  reporting_enabled: false

Docker Compose:

version: "3"

networks:
  loki:

services:
  loki:
    image: grafana/loki:3.0.0
    restart: unless-stopped
    volumes:
      - ./loki:/etc/loki
      - /etc/timezone:/etc/timezone:ro
      - /etc/localtime:/etc/localtime:ro
    ports:
      - "3100:3100"
    command: -config.file=/etc/loki/loki-config.yaml
    networks:
      - loki
    environment:
      - "TZ: Europe/Bucharest"

  promtail:
    image: grafana/promtail:latest
    restart: unless-stopped
    ports:
      - "1514:1514"
    volumes:
      #      - ./promtail/tmp:/tmp
      - /var/log:/var/log
      - ./promtail:/etc/promtail
      - /etc/timezone:/etc/timezone:ro
      - /etc/localtime:/etc/localtime:ro
    command: -config.file=/etc/promtail/config.yml
    networks:
      - loki
    environment:
      - "TZ: Europe/Bucharest"

  grafana:
    container_name: grafana
    image: grafana/grafana:latest
    restart: unless-stopped
    user: '0'
    volumes:
      - ./grafana:/var/lib/grafana
      - /root/loki-docker/grafana/etc/grafana.ini:/etc/grafana/grafana.ini
      - /root/loki-docker/grafana/etc/ldap.toml:/etc/grafana/ldap.toml
      - /etc/timezone:/etc/timezone:ro
      - /etc/localtime:/etc/localtime:ro
    networks:
      - loki
    environment:
      - "TZ: Europe/Bucharest"
  syslog-ng:
    image: balabit/syslog-ng:latest
    restart: unless-stopped
    ports:
      - "514:514/tcp"
      - "514:514/udp"
    depends_on:
      - "promtail"
    volumes:
      - ./syslog-ng/syslog-ng.conf:/etc/syslog-ng/syslog-ng.conf
      - ./syslog-ng/cptsyslog106.conf:/etc/syslog-ng/conf.d/cptsyslog106.conf
      - /var/log/:/var/log/
      - /etc/timezone:/etc/timezone:ro
      - /etc/localtime:/etc/localtime:ro
    command: --no-caps
    networks:
      - loki
    environment:
      - "TZ: Europe/Bucharest"
  nginx:
    container_name: nginx
    image: nginx:alpine
    restart: unless-stopped
    volumes:
      # Nginx reverse proxy configuration files
      - ./conf.d/default.conf:/etc/nginx/conf.d/default.conf
      - ./conf.d/nginx.conf:/etc/nginx/nginx.conf
      # Letsencrypt certificates
      - ./ssl:/etc/nginx/ssl:ro
      # Persistent log storage
      - ./logs:/var/log/nginx
      - /etc/timezone:/etc/timezone:ro
      - /etc/localtime:/etc/localtime:ro
    ports:
      #  - "80:80"
      - "443:443"
    networks:
      - loki
    environment:
      - "TZ: Europe/Bucharest"

This pretty clearly tells us what your problem likely is. If all your logs are in different format (which sounds like they are), and regex doesn’t get timestamp from all of them, then when you try to assign timestamp to an empty value it may be problematic.

You can try to assign timestamp if and only if the regex parsed successfully by using a match block (not tested):

pipeline_stages:
  - match:
     selector: '{job="remotelogs"}'
     stages:
      - regex:
         expression: '^(?P<timestamp>\w{3} \d{2} \d{2}:\d{2}:\d{2})'
      - drop:
         expression: 'MSWinEventLog'

- match:
     selector: '{job="remotelogs",timestamp=~".+"}'
     stages:
      - timestamp:
         source: timestamp
         format: "Jan 02 15:04:05"
         location: Europe/Bucharest

And then figure out why some logs aren’t being parsed correctly from there. You may also consider testing with a smaller set of log files and run promtail with debug and inspect flag enabled.

With this config, all the logs are flowing with no timestamp, as if the match was not made at all.

Also all the logs have the same format:

This is an ignored log with my first config:
Nov 13 15:11:51 [REDACTED] 1 93 Wed Nov 13 15:11:51 2024 keyboardlayout: **********************

This is a parsed log with my first config:
Nov 14 14:14:45 [REDACTED] 1 82 Thu Nov 14 14:14:45 2024 nxlogdata: 2024-11-14 14:14:43 INFO reconnecting in 1 seconds

This is the main reason why I am so confused the format is the same for all logs.

Try changing your regex to:

         expression: `^(?P<timestamp>\w{3} \d{2} \d{2}:\d{2}:\d{2})`

I get this error:

Unable to parse config: /etc/promtail/config.yml: yaml: line 47: found character that cannot start any token. Use `-config.expand-env=true` flag if you want to expand environment variables in your config file
Unable to parse config: /etc/promtail/config.yml: yaml: line 47: found character that cannot start any token. Use `-config.expand-env=true` flag if you want to expand environment variables in your config file
Unable to parse config: /etc/promtail/config.yml: yaml: line 47: found character that cannot start any token. Use `-config.expand-env=true` flag if you want to expand environment variables in your config file
Unable to parse config: /etc/promtail/config.yml: yaml: line 47: found character that cannot start any token. Use `-config.expand-env=true` flag if you want to expand environment variables in your config file

I also tried this:

'`^(?P<timestamp>\w{3} \d{2} \d{2}:\d{2}:\d{2})`'

But it parsed every log with no regards to the timestamp

Are you able to reproduce the error reliably? You mentioned in your previous replies that you have two logs, one is parsed and the other isn’t. If you just put those two logs side by side and run promtail in debug mode, is one of them still ignored?

Do you happen to have a lot of logs that are identical to each other?

Yes I can reproduce the problem with those exact logs (also everything is and was done in debug mode). Both of them show up with no errors in promtail. Even the timestamp is clean for both:

The log that if ignored:

job:remotelogs timestamp:Nov 13 15:11:51

the log that is parsed:

job:remotelogs timestamp:Nov 14 14:55:05

In Grafana I can set the job to filename and I can select both of them, but one log won’t appear and the other one will. Mind you, this is with the timestamp match applied. So it doesn’t really seem like it’s problem with promtail. It looks like it scrapes the logs just fine with the timestamp.

And well I kinda do have indentical logs, basically every log sent to the server is a Powershell transcript, and those look the exact same but for the the host that it comes from and the time of the log. These parts are saved inside the files on the system. So no, I don’t have lines that match EXACTLY the same, however they are really similar.