Send to Loki, via Promtail, only a subset of lines, not all! (Like a "grep")

Hi

I am using Promtail to analyse an heterogeneous log file made more or less like this:

 SELL O -- 16000101 00:00:00.00 UTC save: 20240503 14:08:33.83 87
 SELL P -- 16000101 00:00:00.00 UTC save: 20240503 14:08:33.83 87
 SELL R -- 16000101 00:00:00.00 UTC save: 20240503 14:08:33.83 87
 SELL E -- 16000101 00:00:00.00 UTC save: 20240503 14:08:33.83 87
 SPS2 F IV -- 16000101 00:00:00.00 UTC save: 20240503 14:08:33.83 87
 SPS2 F IV -- 16000101 00:00:00.00 UTC save: 20240503 14:08:33.83 87
 SPS2 H -- 16000101 00:00:00.00 UTC save: 20240503 14:08:33.83 87
 WARNING: 864570_2 SCIAR G -- was requested multiple or was not found.
 WARNING: 864570_2 SCIAR H -- multiplication was requested or was not found.
 WARNING: 864570_2 SCIAR L -- was requested multiple or was not found.
 WARNING: 864570_2 SCIAR T -- was requested in multiplication mode, or was not found.
 SPS2 F IV -- 16000101 00:00:00 UTC save: 20240503 14:08:33.83 87
 SPS2 F IV -- 16000101 00:00:00.00 UTC save: 20240503 14:08:33.83 87
 SPS2 H -- 16000101 00:00:00.00 UTC save: 20240503 14:08:33.83 87
 WARNING: 864570_2 SCIAR G -- was requested multiple or was not found.
 WARNING: 864570_2 SCIAR H -- a multiplication was requested or was not found
 MATCH FOUND 20240503 13:00:30.61
 SALE O -- 16000101 00:00:00.00 UTC save: 20240503 14:08:33.83 87
 SELL P -- 16000101 00:00:00.00 UTC save: 20240503 14:08:33.83 87
 SELL R -- 16000101 00:00:00.00 UTC save: 20240503 14:08:33.83 87
 SELL E -- 16000101 00:00:00.00 UTC save: 20240503 14:08:33.83 87
 SPS2 F IV -- 16000101 00:00:00.00 UTC save: 20240503 14:08:33.83 87

I created a Promtail configuration file:

server:
  http_listen_port: 9080
  grpc_listen_port: 0
  log_level: debug

positions:
  filename: /tmp/positions.yaml

clients:
  - url: http://host:3100/loki/api/v1/push

scrape_configs:
  - job_name: server1
    static_configs:
      - obiettivi:
          - localhost
        etichette:
          job: server1
          __path__: /var/logs/server1/mytest.log
    pipeline_stages:
      - match:
          selettore: '{job="server1"} |= "MATCH"'
          stages:
            - regex:
                expression: ‘^MATCH FOUND\s+(?P<time>\d{8}\s+\d{2}:\d{2}:\d{2}\.\d{2})'
            - timestamp:
                format: "20060102 15:04:05.99"
                source: time
                location: "Etc/UTC"
            - drop:
                older_than: 3h
                drop_counter_reason: "line_too_old"

The problem is that in Loki, running the query {job="server1"}, I see the whole log lines and not only the lines that contain MATCH string; and this is useless for me and takes up an unnecessary disk space.

Is there any way to store only the necessary log lines in Loki? Something like a simple grep performed by Promtail.

Thank you.

Use drop stage with a regex to drop non matching lines by:

  - drop:
    expression: "^((?!MATCH FOUND).)*$"

ERRATA CORRIGE!

The config file was translated form my browser; this is the correct one:

server:
  http_listen_port: 9080
  grpc_listen_port: 0
  log_level: debug

positions:
  filename: /tmp/positions.yaml

clients:
  - url: http://host:3100/loki/api/v1/push

scrape_configs:
  - job_name: server1
    static_configs:
      - targets:
          - localhost
        labels:
          job: server1
          __path__: /var/logs/server1/mytest.log
    pipeline_stages:
      - match:
          selector: '{job="server1"} |= "MATCH"'
          stages:
            - regex:
                expression: ‘^MATCH FOUND\s+(?P<time>\d{8}\s+\d{2}:\d{2}:\d{2}\.\d{2})'
            - timestamp:
                format: "20060102 15:04:05.99"
                source: time
                location: "Etc/UTC"
            - drop:
                older_than: 3h
                drop_counter_reason: "line_too_old"

Hi @ssaldi

So Promtail always sends all the rows in the log to Loki? Aren’t the various match directives within pipeline_stages used to filter the rows to send to Loki? Should I exclude them with a drop?

Thank you.

According to my experience, no, regex is not filtering out lines, just capture different parts. But I’m also a newbie :slight_smile:

Actually the documentation is not 100% clear, it just says: " The regex stage is a parsing stage that parses a log line using a regular expression. Named capture groups in the regex support adding data into the extracted map." regex | Grafana Loki documentation