*Resolved* - Promtail "runs away" until system lockup, (5 seconds) on a near stock Debian Stable install (bookworm)

Hello everyone, first post as I am trying to build an IDS panel leveraging on-prem (no i wont pay for the cloud, ever) grafana + loki + promtail + snort3

I have all of these working BUT promtail because it just loops and chokes itself to death

VM hosted on Proxmox VE w/ 4 cpu’s and 24GB of vRAM

Version:

promtail, version 2.8.2 (branch: HEAD, revision: 9f809eda7)
  build user:       root@b7e9ca0bf6e0
  build date:       2023-05-03T11:13:57Z
  go version:       go1.20.4
  platform:         linux/amd64

The Issue:

  • Regardless if its 8GB, 64GB of RAM, as soon as the promtail service starts, the leaks/runs away until complete lockup (under 20 seconds)

What have I tried:

  • Different vCPU (now on host cpu, EPYC 7551P, which has AVX2 encoding, to be safe)
  • Different RAM
  • Various bandwidth limits in the .yaml file (no change in end result)

Current Configs:

  • Promtail .service file
[Unit]
Description=Promtail Service
After=network.target

[Service]
Type=simple
User=promtail
ExecStart=/opt/loki/promtail-linux-amd64 -config.file=/opt/loki/promtail-local-config.yaml

[Install]
WantedBy=multi-user.target
  • Current YAML file for Promtails
server:
  http_listen_port: 9080
  grpc_listen_port: 0

positions:
  filename: /tmp/positions.yaml

clients:
  - url: http://localhost:3100/loki/api/v1/push

limits_config:
  readline_rate_enabled: true
  readline_rate: 10
  readline_burst: 20



scrape_configs:
- job_name: system
  static_configs:
  - targets:
      - localhost
    labels:
      job: varlogs
      __path__: /var/log/*log

Just for reference:

  • Current Loki YAML file
auth_enabled: false

server:
  http_listen_port: 3100
  grpc_listen_port: 9096

common:
  instance_addr: 127.0.0.1
  path_prefix: /tmp/loki
  storage:
    filesystem:
      chunks_directory: /tmp/loki/chunks
      rules_directory: /tmp/loki/rules
  replication_factor: 1
  ring:
    kvstore:
      store: inmemory

query_range:
  results_cache:
    cache:
      embedded_cache:
        enabled: true
        max_size_mb: 100

schema_config:
  configs:
    - from: 2020-10-24
      store: boltdb-shipper
      object_store: filesystem
      schema: v11
      index:
        prefix: index_
        period: 24h

ruler:
  alertmanager_url: http://localhost:9093
  • All the other services
user@host:/opt/opensearch$ systemctl status snort3
● snort3.service - Snort Daemon
     Loaded: loaded (/etc/systemd/system/snort3.service; enabled; preset: enabled)
     Active: active (running) since Thu 2023-07-06 23:43:59 EDT; 43min ago
   Main PID: 636 (snort)
      Tasks: 2 (limit: 28769)
     Memory: 295.5M
        CPU: 33.325s
     CGroup: /system.slice/snort3.service
             └─636 /usr/local/bin/snort -c /usr/local/etc/snort/snort.lua -s 65535 -k none -l /var/log/snort -D -i ens18 -m 0x1b -u snort ->

user@host:/opt/opensearch$ systemctl status grafana-server.service 
● grafana-server.service - Grafana instance
     Loaded: loaded (/lib/systemd/system/grafana-server.service; enabled; preset: enabled)
     Active: active (running) since Thu 2023-07-06 23:44:02 EDT; 43min ago
       Docs: http://docs.grafana.org
   Main PID: 890 (grafana)
      Tasks: 20 (limit: 28769)
     Memory: 172.6M
        CPU: 5.336s
     CGroup: /system.slice/grafana-server.service
             └─890 /usr/share/grafana/bin/grafana server --config=/etc/grafana/grafana.ini --pidfile=/run/grafana/grafana-server.pid --pack>

user@host:/opt/opensearch$ systemctl status loki.service
● loki.service - Loki logging daemon
     Loaded: loaded (/etc/systemd/system/loki.service; enabled; preset: enabled)
     Active: active (running) since Thu 2023-07-06 23:43:59 EDT; 43min ago
   Main PID: 632 (loki-linux-amd6)
      Tasks: 9 (limit: 28769)
     Memory: 93.5M
        CPU: 5.291s
     CGroup: /system.slice/loki.service
             └─632 /opt/loki/loki-linux-amd64 -config.file=/opt/loki/loki-local-config.yaml

Thank you for your time,
Scott

RESOLVED!!

There was a corrupt system log file in /var/log

I ran ls -lshat /var/log and noticed that /var/log/lastlog was 531GIGABYTES
2023-07-07_16-09

For a system with less than 100GB of storage, that was impossible.

I cleared the log file by running (as root) >/var/log/lastlog and it cleared it out. Be aware this does track who all signed into the machine so there may be operational consequences to you if you remove/delete the fie.

restarted it, CPU and RAM didn’t supersan the VM to death.