Promtail error writing positions.yaml

I’m trying to collect almost 3GB of log files everyday with promtail.
Promtail runs fine until some hours and starts to throw the error such as:

Jul 07 14:54:53 ip-127.0.0.1 promtail-linux-arm64[74099]: level=error ts=2023-07-07T14:54:53.188346106Z caller=positions.go:179 msg="error writing positions file" error="open/local/promtail/.promtail_positions.yaml3507440037425283515: too many open files"

Jul 07 14:54:58 ip-127.0.0.1 promtail-linux-arm64[74099]: level=error ts=2023-07-07T14:54:58.078325647Z caller=positions.go:179 msg="error writing positions file" error="open /local/promtail/.promtail_positions.yaml6187472917671051760: too many open files"

Jul 07 14:55:03 ip-127.0.0.1 promtail-linux-arm64[74099]: level=error ts=2023-07-07T14:55:03.246668661Z caller=positions.go:179 msg="error writing positions file" error="open /local/promtail/.promtail_positions.yaml5397198011226637596: too many open files"

Jul 07 14:55:08 ip-127.0.0.1 promtail-linux-arm64[74099]: level=error ts=2023-07-07T14:55:08.157937338Z caller=positions.go:179 msg="error writing positions file" error="open /local/promtail/.promtail_positions.yaml4642695468272810694: too many open files"

Jul 07 14:55:13 ip-127.0.0.1 promtail-linux-arm64[74099]: level=error ts=2023-07-07T14:55:13.048319052Z caller=positions.go:179 msg="error writing positions file" error="open /local/promtail/.promtail_positions.yaml6508309409450339181: too many open files"

Jul 07 14:55:18 ip-127.0.0.1 promtail-linux-arm64[74099]: level=error ts=2023-07-07T14:55:18.146311846Z caller=positions.go:179 msg="error writing positions file" error="open /local/promtail/.promtail_positions.yaml4642093189922208207: too many open files"

Jul 07 14:55:23 ip-127.0.0.1 promtail-linux-arm64[74099]: level=error ts=2023-07-07T14:55:23.149764251Z caller=positions.go:179 msg="error writing positions file" error="open /local/promtail/.promtail_positions.yaml3585195896468389427: too many open files"

Note:

  1. Not running on containers and pods but on a VM
  2. The config for loki and promtail are set accordingly to the size and they run on the same host (aarch64)

Earlier GitHub issues gave an impression to change the following
fs.inotify.max_user_watches = 100000
fs.inotify.max_user_instances = 512
fs.inotify.max_queued_events = 100000

ulimit -n : 1000000

Ya, changed those and was not able to identify what was the problem.
Need Help…

  1. How many file descriptors are you actually using? lsof should tell you, there is a chance you are actually running over still.

  2. I notice the file being written is randomly named, which is a bit weird. What does your promtail configuration look like?

server:
http_listen_port: 9080
grpc_listen_port: 0

positions:
filename: /local/promtail/promtail_positions.yaml
sync_period: 5s
ignore_invalid_yaml: true

clients:

scrape_configs:

  • job_name: buildlogs
    static_configs:
    • targets:
      • localhost
        labels:
        job: buildlogs
        path: /home/user/1*/gts.//*.log
        pipeline_stages:
    • match:
      selector: ‘{job=“logs”}’
      stages:
      • regex:
        source: filename
        expression: “(?:user)/(?P\S+?)_(?P\S+?)/gts.(?P\S+?)/(?P\S+?)/(?P\S+?)-(?Pdebug)-(?P\S+?).lastSuccessful.log”
      • labels:
        release:
        number:
        application:
        tag:
        toolchain:
        type:
        target:

limits_config:
readline_rate: 1000000
readline_burst: 10000000
max_streams: 100
max_line_size: 51120M

target_config:
sync_period: “10s”

How many file descriptors is promtail actually using? Do you perhaps have a lot of log files?

Yes, I do have lot of log files.
promtail is using more than 60000 file descriptors already.

If you have a lot of log files I am not sure if there is a good way around it besides just increasing the number of file descriptor in ulimit. You can check the maximum kernel limit, and as long as you don’t get too close you should be fine. You may also consider moving the files to other servers via tools such as rsync or nfs if you don’t want to risk it, but that of course adds to the complexity of your setup.