Improve my pipeline_stages

nyxtorm · June 5, 2024, 1:01pm

Hello,

I’m discovering Grafana, Loki and Promtail to process my Apache and Nginx logs.
I have created this pipeline_stages which works well to define the level label depending on the value of http_code:

  pipeline_stages:
  - match:
      selector: '{job="apache"}'
      stages:
      - regex:
          expression: '^\S+ \S+ \S+ \S+ \S+ \S+ \S+ \[.+\] "\S+ \S+ \S+" (?P<http_code>\d{3}) \S+ "[^"]*" "[^"]*" \S+ \S+ In:\S+ Out:.+:.+pct. \S+$'
      - labels:
          http_code:

  - match:
      selector: '{job="apache", http_code=~"(2|3)\\d{2}"}'
      stages:
      - static_labels:
          level: 'info'
  - match:
      selector: '{job="apache", http_code=~"4\\d{2}"}'
      stages:
      - static_labels:
          level: 'warn'
  - match:
      selector: '{job="apache", http_code=~"5\\d{2}"}'
      stages:
      - static_labels:
          level: 'crit'

At present, I have to create the http_code label on the first part to be able to match the static_labels afterwards.

Is it possible to optimize my pipeline_stages so that :

I specify my regex only once, as at present
I don’t export the http_code label
My selectors for defining the value of my static_labels can retrieve the value of http_code directly from the regex

My aim is to try and lighten processing as much as possible by avoiding unnecessary label exports for loki, and to group processing.

Thank your for your help!

nyxtorm · June 5, 2024, 3:28pm

New file version :

  pipeline_stages:
  - match:
      selector: '{job="apache"}'
      stages:
      - regex:
          expression: '^\S+ \S+ \S+ \S+ \S+ \S+ \S+ \[(?P<time>.+)\] "\S+ \S+ \S+" (?P<http_code>\d{3}) \S+ "[^"]*" "[^"]*" \S+ \S+ In:\S+ Out:.+:.+pct. \S+$'
      - labels:
          http_code:
      - match:
          selector: '{http_code=~"(2|3)\\d{2}"}'
          stages:
          - static_labels:
              level: 'info'
      - match:
          selector: '{http_code=~"4\\d{2}"}'
          stages:
          - static_labels:
              level: 'warn'
      - match:
          selector: '{http_code=~"5\\d{2}"}'
          stages:
          - static_labels:
              level: 'crit'
      - labeldrop:
          - http_code
      - timestamp:
          format: '2006-01-02T15:04:05-0700'
          source: time

I added the labeldrop, but I’m not sure that adding it and then removing is the best solution…

tonyswumac · June 5, 2024, 9:02pm

The only thing you need to parse is probably the timestamp. The rest I’d say you can just send to Loki as is, then use the pattern filter to parse the logs in real time.

For example, let’s say your Nginx logs look something like this:

127.0.0.1 - - [05/Jun/2024:20:59:50 +0000] "GET /api/something HTTP/1.1" 200

You could do:

{SELECTOR} | pattern `<_> <_> <_> [<_>] "<method> <path> <http_version>" <http_status>`

Lastly, a small nitpick. In my opinion you should not be setting level label based on your nginx http status. The level is supposed to denote whether the logs themselves are info or warn, not the content of the logs. But this is of course is just my opinion.

system · June 5, 2025, 9:02pm

This topic was automatically closed 365 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Promtail - Pipeline stage with Regex is not parsing the log line and creating labels Dashboards loki , promtail	0	1061	July 9, 2023
Promtail to create new label based on the selector Grafana Loki loki	1	456	August 16, 2022
Promtail pipeline not sending labels to Loki Grafana	5	2681	June 17, 2021
Unable to add labels to the logs Grafana Loki loki , configuration	9	1082	October 23, 2024
Not able to create Labels from Promtail static_configs Grafana Loki promtail	4	1845	July 11, 2024

Improve my pipeline_stages

Related topics