Filtering in promtail

I am using promtail to push logs from several bare metal servers to Loki, and I do filtering in Loki, for instance:
{job="ubuntu_server01_varlogs"} |~ "[Ee]rror" !~"Read_Error_Rate" !~"ubuntu-advantage-timer" ...

However, now Loki has repeatedly become overwhelmed with logs:
2021-12-03 09:55:33 Dec 3 09:55:32 server01 promtail-linux-amd64[2205]: level=warn ts=2021-12-03T08:55:32.480069794Z caller=client.go:344 component=client host=10.0.0.21:3100 msg="error sending batch, will retry" status=429 error="server returned HTTP status 429 Too Many Requests (429): Ingestion rate limit exceeded (limit: 4194304 bytes/sec) while attempting to ingest '8055' lines totaling '1048458' bytes, reduce log volume or contact your Loki administrator to see if the limit can be increased"
Perhaps it is better to filter out unwanted messages in promtail before pushing them to Loki? If so, how do I filter based on the log message content? My promtail config-file looks like this right now:

server:
  http_listen_port: 9080
  grpc_listen_port: 0

positions:
  filename: /tmp/positions.yaml

clients:
  - url: http://10.0.0.21:3100/loki/api/v1/push

scrape_configs:
- job_name: server01
  static_configs:
  - targets:
      - localhost
    labels:
      job: ubuntu_server01_varlogs
      __path__: "/var/log/*log"

Thanks.

1 Like

Hi @hubba ,

just out of curiosity, what sort of log volumes do you see? I was doing load tests and had no problems ingesting close to 10k logs per second with just one distributor and one indexer node. I’m using “micro services mode” with S3 storage…

I also saw some 429 errors at the very start. Promtail replays all log files it finds so initially there will be a flood of messages. Not sure if you can configure it to start from the tail of log files and only ship new lines.

Looked at rates and this problem occurs at server startup. Up to around 16k/sec. This is a bare-metal installation both on clients and on the machine running Loki and Grafana, so there is perhaps less leeway in terms of capacity?

I believe I figured it out. Read through Scraping | Grafana Labs, Configuration | Grafana Labs, and match | Grafana Labs. The one on configuration is particularly important, I think. Anyway, my promtail configuration file now looks like this:

server:
  http_listen_port: 9080
  grpc_listen_port: 0

positions:
  filename: /tmp/positions.yaml

clients:
  - url: http://10.0.0.21:3100/loki/api/v1/push

scrape_configs:
- job_name: server01
  static_configs:
  - targets:
      - localhost
    labels:
      job: server01_varlogs
      __path__: "/var/log/*log"
  pipeline_stages:
  - match:
      selector: '{job="server01_varlogs"}'
      stages:
      - json:
          expressions:
            msg: message
  - match:
      selector: '{job="server01_varlogs"} !~ ".*rror.*"'
      action: drop
      drop_counter_reason: promtail_non_error
  - output:
      source: msg
  - match:
      selector: '{job="server01_varlogs"} |~ ".*ubuntu-advantage-timer.*"'
      action: drop
      drop_counter_reason: promtail_noisy_error
  - output:
      source: msg

This filters out everything that doesn’t contain "rror", and everything containing “ubuntu-advantage-timer”. Will have to work on the filter to include things containing critical, bug, fail, etc, but at least I got it working.

This topic was automatically closed 365 days after the last reply. New replies are no longer allowed.