Promtail - regex pipeline vs. pattern parser in Loki


For unstructured logs (from Microsoft IIS) should I (still) have a regex pipeline stage in the Promtail config, or should I just count on the newer [pattern parser](New in Loki 2.3: LogQL pattern parser makes it easier to extract data from unstructured logs | Grafana Labs) in Loki 2.3? I’m not clear on where pattern parser should replace the promtail regex pipleline or not. Please point me in the right direction.


For this working promtail config:

  #http_listen_port: 0
  http_listen_port: 9080
  grpc_listen_port: 0

  filename: C:\promtail\positions.yaml

- url:
 - job_name: iis
     # In order to monitor site logs insert another
     # static_config like the sample below 
   - targets:
       - localhost
       job: iis
       #instance: <your-instance-name>
       #site: <your-site1-name>
       __path__: C:/inetpub/logs/LogFiles/W3SVC6/*.log
   - match:
       # Drop lines that are comments (start with #)
       selector: '{job="iis"} |~ "^#"'
       action: drop
       drop_counter_reason: iis_comment_line
   - match:
       selector: '{job="iis"}'
         - regex:
             expression: '(?P<timestamp>[[0-9]{4}-[0-9]{2}-[0-9]{2} [0-9]{1,2}:[0-9]{1,2}:[0-9]{1,2}) (?P<server_ip>\S+) (?P<method>\S+?) (?P<cs_uri_stem>\S+?) (?P<cs_uri_query>\S+?) (?P<s_port>\S+?) (?P<cs_username>\S+?) (?P<c_ip>\S+?) (?P<cs_User_Agent>\S+?) (?P<cs_referer>\S+?) (?P<cs_host>\S+?) (?P<sc_status>\S+?) (?P<sc_substatus>\S+?) (?P<sc_win32_status>\S+?) (?P<sc_bytes>\S+?) (?P<cs_bytes>\S+?) (?P<time_taken>\S+?)'
         - timestamp:
             source: timestamp
             format: "2006-01-02 15:04:05"

What are you looking to do? Any example?

If you are just looking to get logs into Loki without any parsing, you don’t need either of those actually.

@tonyswumac Well, the regex has half-dozen named captures. Is there any point in putting regex in the pipeline if pattern parser can put them to labels? Q: Under what scenario should I use regex in the promtail pipeline if the pattern parser does the same but better, just missing the conceptual part(s)?

This comes down to a bit of personal preference, so my opinions are my own.

In general, I prefer to keep ingestion pipeline as simple as possible, which means to parse logs as little as possible during ingestion. Things that I would parse logs for is to filter some logs to drop, parse for accurate timestamp, or extract labels that are useful. I will also parse logs if there are things I can’t do in logql, which happens sometimes.

In your example, the only parsing you are doing is timestamp, and I’d say that’s good enough unless you see labels that would be useful to extract. And the way to determine this is Best practices | Grafana Loki documentation. Because Loki recommends caution on having too many labels, I’d say use labels only as a way to distinguish log streams (such as environment, instance ID, log level, cloud account ID, etc.), and the rest you would parse them using logql’s pattern or regex filters.

1 Like

@tonyswumac Thanks, that gets me over the hump and in the right direction.

1 Like