Loki Ruler Alert — Missing Label Value (log message) in Alertmanager but visible in Grafana

I’m trying to set up a Loki alert rule that fires whenever a certain pattern like error appears in systemd-journal logs. In the alert notification, I also want to include the full log message that matched the pattern as a label (so that Alertmanager can display it in the alert).

Here’s what I have so far.

groups:
  - name: node-log-errors
    rules:
     - alert: MyAlertForNode
       expr: |
             sum by (node, logmsg) (
             count_over_time(
             {job=“systemd-journal”} |= “error”
             | regexp (?P<logmsg>.+)
             | label_format logmsg=“{{.logmsg}}”
             [1m]
             )
             ) > 0
       for: 1m
       labels:
         severity: critical
         node: ‘{{{{ $labels.node }}}}’
         error_summary: ‘{{{{ $labels.logmsg }}}}’
       annotations:
         summary: “Error log detected on node {{{{ $labels.node }}}}”
         description: |
                Log message: {{{{ $labels.logmsg }}}}
                Please investigate this issue on node {{{{ $labels.node }}}}.
  • What works: When I test this LogQL expression in Grafana (via the Explore view, using Loki as a data source), I can see the logmsg label and its value correctly — the full log message appears as expected.

  • What doesn’t work": However, after deploying the alert rule, when the alert fires. The alert appears in Alertmanager, but The label logmsg (and consequently error_summary) is missing — it doesn’t appear in the alert labels.

Why does the logmsg label (which is visible in the Loki query results) not get propagated to Alertmanager when the alert fires?
Is this a limitation in how Loki ruler handles dynamic labels from regexp / label_format?
Or do I need to adjust the query or alert definition to persist these labels?

Any guidance or examples on how to include the actual log message in alert labels or annotations would be appreciated.

Thanks!

I noticed you are surrounding your label variables with 4 curly brackets, is that because of Ansible? If so, perhaps you also need to do the same for this line?

             | label_format logmsg=“{{.logmsg}}”

Hi, thanks for pointing that out.

Yes, I’m using Helm templating, which is why you’re seeing the 4 curly brackets. I corrected the templating accordingly, but I’m still not getting the intended result - the logmsg label is not propagated to Alertmanager.

I tried again with the following rule:

 - name: node-log-errors
   rules:
   - alert: MyAlertForNode
     expr: sum by (node) (count_over_time({job="systemd-journal"} |= "err" | regexp "(?P<logmsg>.+)" | label_format logmsg=“{{ `{{ $labels.logmsg }}` }}”[1m]) ) > 0
     for: 30s
     labels:
        severity: critical
        node: '{{ `{{ $labels.node }}` }}'
        error_summary: '{{ `{{ $labels.logmsg }}` }}'
     annotations:
        summary: "Error log detected on node {{ `{{ $labels.node }}` }}"
        description: |
             Log message: {{ `{{ $labels.logmsg }}` }}
             Please investigate this issue on node {{ `{{ $labels.node }}` }}

From what I’ve read online, this seems to be a limitation of Loki Ruler alerting: the alerting metrics (ALERTS in Prometheus) only propagate labels that are part of the final aggregation (sum by), and dynamic labels extracted via regexp, pattern, or label_format are dropped. As a result, Alertmanager only receives node, while logmsg never makes it through.

To work around this, I also tried creating a recording rule first to materialize the log message as a persistent label, and then alerting on that recorded metric - but this also didn’t work. Here’s what I tried:

  - name: systemd-journal-errors-recording
    interval: 1m
    rules:
       # Step 1: Record metric with logmsg label (materializes it)
      - record: log_errors:count_by_node_logmsg
        expr: |
          sum by (node, logmsg) (
          count_over_time(
          {job="systemd-journal"} |= "err" 
          | regexp `(?P<logmsg>.+)`
          | label_format logmsg=“{{ `{{ $labels.logmsg }}` }}”
          [1m]
          )
         ) > 0
       # Step 2: Alert on the recorded metric
      - alert: SystemdJournalErrorDetected
        expr: log_errors:count_by_node_logmsg > 0
        for: 1m
        labels:
            severity: critical
        annotations:
            summary: "Error on {{ `{{ $labels.node }}` }}: {{ `{{ $labels.logmsg }}` }}"
            description: |
                Log message: {{ `{{ $labels.logmsg }}` }}
                Please investigate this issue on node {{ `{{ $labels.node }}` }}

At this point, I’m essentially out of options. Is there any supported or recommended pattern to include the matched log content in alerts (perhaps only via annotations, or by relying on external tooling or integrations)?

Any guidance or clarification would be greatly appreciated.

Thanks!

It is true that you need to have the labels in your aggregation in order for it to show up in your alert. I would just add logmsg to your aggregation in your original alert instead of using recording rule (if you use recording rule you need to create the alert on your prometheus cluster instead).

Something like this should work:

- name: node-log-errors
   rules:
   - alert: MyAlertForNode
     expr: sum by (node, logmsg) (count_over_time({job="systemd-journal"} |= "err" | regexp "(?P<logmsg>.+)" | label_format logmsg=“{{ `{{ $labels.logmsg }}` }}”[1m]) ) > 0
     for: 30s
     labels:
        severity: critical
        node: '{{ `{{ $labels.node }}` }}'
        error_summary: '{{ `{{ $labels.logmsg }}` }}'
     annotations:
        summary: "Error log detected on node {{ `{{ $labels.node }}` }}"
        description: |
             Log message: {{ `{{ $labels.logmsg }}` }}
             Please investigate this issue on node {{ `{{ $labels.node }}` }}