Promtail - rewriting output (and using internal labels)

Best practice with Loki is to create as few labels as possible and to use the power of stream queries. To this end, it suggests that even a small number of labels combined with a small number of values can cause problems.

Therefore when scraping syslog it would seem sensible to not create labels for all syslog internal fields.

This leaves the problem of how to retain that data, as it would be lost if simply discarded by not setting relabel configs for it.

An option might be to adjust the message content to include this data. For example, including the facility or severity in the message content itself rather than as a label.

The problem seems to be that the internal labels do not appear to be available in either replace or templates.

Example with input syslog message of “hello world”

scrape_configs:
  - job_name: syslog
    syslog:
      listen_address: 127.0.0.1:1514
      idle_timeout: 60s
      label_structured_data: yes
      labels:
        job: "syslog"
        host: myhostname
    pipeline_stages:
      - replace:
          expression: "(?P<content>.*)"
          replace: '[{{ .__syslog_message_severity }}] {{ .Value }}'

This results in a message output of:
[<no value>] hello world

Am I going about rewriting the message content wrongly here, or is there a problem with internal labels being used on replace or template stages?

3 Likes

Hi Did you solve this problem?

I have this exact issue and so far I have found a way of retaining information using pack.

i.e.

- job_name: syslog
  syslog:
    listen_address: 0.0.0.0:1514 # make sure you also expose this port on the container
    idle_timeout: 60s
    label_structured_data: no
    labels:
      job: "syslog"
  relabel_configs:
    - source_labels: ['__syslog_message_hostname']
      target_label: 'host'
      
    - source_labels: ['__syslog_message_app_name']
      target_label: 'app_name'
      
    - source_labels: ['__syslog_message_proc_id']
      target_label: 'proc_id'
      
    - source_labels: ['__syslog_message_msg_id']
      target_label: 'msg_id'
      
    - source_labels: ['__syslog_connection_ip_address']
      target_label: 'ip_address'
      
    - source_labels: ['__syslog_connection_hostname']
      target_label: 'hostname'
      
    - source_labels: ['__syslog_message_severity']
      target_label: 'severity'
      
    - source_labels: ['__syslog_message_facility']
      target_label: 'facility'
  pipeline_stages:
    - match:
        selector: '{job="syslog"}' #match all
        stages:
          - pack:
              labels:
                - proc_id
                - msg_id
                - ip_address
                - hostname
                - severity
                - facility

This results in outputs that are in json. format. it also means if the original json then it is json inside json as an escaped string which is not ideal.

Here is an example: