Loki Ruler Alert — Missing Label Value (log message) in Alertmanager but visible in Grafana

I’m trying to set up a Loki alert rule that fires whenever a certain pattern like error appears in systemd-journal logs. In the alert notification, I also want to include the full log message that matched the pattern as a label (so that Alertmanager can display it in the alert).

Here’s what I have so far.

groups:
  - name: node-log-errors
    rules:
     - alert: MyAlertForNode
       expr: |
             sum by (node, logmsg) (
             count_over_time(
             {job=“systemd-journal”} |= “error”
             | regexp (?P<logmsg>.+)
             | label_format logmsg=“{{.logmsg}}”
             [1m]
             )
             ) > 0
       for: 1m
       labels:
         severity: critical
         node: ‘{{{{ $labels.node }}}}’
         error_summary: ‘{{{{ $labels.logmsg }}}}’
       annotations:
         summary: “Error log detected on node {{{{ $labels.node }}}}”
         description: |
                Log message: {{{{ $labels.logmsg }}}}
                Please investigate this issue on node {{{{ $labels.node }}}}.
  • What works: When I test this LogQL expression in Grafana (via the Explore view, using Loki as a data source), I can see the logmsg label and its value correctly — the full log message appears as expected.

  • What doesn’t work": However, after deploying the alert rule, when the alert fires. The alert appears in Alertmanager, but The label logmsg (and consequently error_summary) is missing — it doesn’t appear in the alert labels.

Why does the logmsg label (which is visible in the Loki query results) not get propagated to Alertmanager when the alert fires?
Is this a limitation in how Loki ruler handles dynamic labels from regexp / label_format?
Or do I need to adjust the query or alert definition to persist these labels?

Any guidance or examples on how to include the actual log message in alert labels or annotations would be appreciated.

Thanks!

I noticed you are surrounding your label variables with 4 curly brackets, is that because of Ansible? If so, perhaps you also need to do the same for this line?

             | label_format logmsg=“{{.logmsg}}”

Hi, thanks for pointing that out.

Yes, I’m using Helm templating, which is why you’re seeing the 4 curly brackets. I corrected the templating accordingly, but I’m still not getting the intended result - the logmsg label is not propagated to Alertmanager.

I tried again with the following rule:

 - name: node-log-errors
   rules:
   - alert: MyAlertForNode
     expr: sum by (node) (count_over_time({job="systemd-journal"} |= "err" | regexp "(?P<logmsg>.+)" | label_format logmsg=“{{ `{{ $labels.logmsg }}` }}”[1m]) ) > 0
     for: 30s
     labels:
        severity: critical
        node: '{{ `{{ $labels.node }}` }}'
        error_summary: '{{ `{{ $labels.logmsg }}` }}'
     annotations:
        summary: "Error log detected on node {{ `{{ $labels.node }}` }}"
        description: |
             Log message: {{ `{{ $labels.logmsg }}` }}
             Please investigate this issue on node {{ `{{ $labels.node }}` }}

From what I’ve read online, this seems to be a limitation of Loki Ruler alerting: the alerting metrics (ALERTS in Prometheus) only propagate labels that are part of the final aggregation (sum by), and dynamic labels extracted via regexp, pattern, or label_format are dropped. As a result, Alertmanager only receives node, while logmsg never makes it through.

To work around this, I also tried creating a recording rule first to materialize the log message as a persistent label, and then alerting on that recorded metric - but this also didn’t work. Here’s what I tried:

  - name: systemd-journal-errors-recording
    interval: 1m
    rules:
       # Step 1: Record metric with logmsg label (materializes it)
      - record: log_errors:count_by_node_logmsg
        expr: |
          sum by (node, logmsg) (
          count_over_time(
          {job="systemd-journal"} |= "err" 
          | regexp `(?P<logmsg>.+)`
          | label_format logmsg=“{{ `{{ $labels.logmsg }}` }}”
          [1m]
          )
         ) > 0
       # Step 2: Alert on the recorded metric
      - alert: SystemdJournalErrorDetected
        expr: log_errors:count_by_node_logmsg > 0
        for: 1m
        labels:
            severity: critical
        annotations:
            summary: "Error on {{ `{{ $labels.node }}` }}: {{ `{{ $labels.logmsg }}` }}"
            description: |
                Log message: {{ `{{ $labels.logmsg }}` }}
                Please investigate this issue on node {{ `{{ $labels.node }}` }}

At this point, I’m essentially out of options. Is there any supported or recommended pattern to include the matched log content in alerts (perhaps only via annotations, or by relying on external tooling or integrations)?

Any guidance or clarification would be greatly appreciated.

Thanks!

It is true that you need to have the labels in your aggregation in order for it to show up in your alert. I would just add logmsg to your aggregation in your original alert instead of using recording rule (if you use recording rule you need to create the alert on your prometheus cluster instead).

Something like this should work:

- name: node-log-errors
   rules:
   - alert: MyAlertForNode
     expr: sum by (node, logmsg) (count_over_time({job="systemd-journal"} |= "err" | regexp "(?P<logmsg>.+)" | label_format logmsg=“{{ `{{ $labels.logmsg }}` }}”[1m]) ) > 0
     for: 30s
     labels:
        severity: critical
        node: '{{ `{{ $labels.node }}` }}'
        error_summary: '{{ `{{ $labels.logmsg }}` }}'
     annotations:
        summary: "Error log detected on node {{ `{{ $labels.node }}` }}"
        description: |
             Log message: {{ `{{ $labels.logmsg }}` }}
             Please investigate this issue on node {{ `{{ $labels.node }}` }}
1 Like

Thanks Tony - I did try aggregating by logmsg as suggested, but unfortunately I’m still not able to get the log message into the alert.

Here’s what I’m seeing on my side:

  1. When I use sum by (node, logmsg) and keep
    label_format logmsg=“{{ {{ $labels.logmsg }} }}”
    in the expression, no alerts fire at all, even though I can see matching logs in Grafana for the same query.

  2. If I change label_format to something like:
    label_format logmsg=“{{.logmsg}}”
    or
    label_format logmsg=“{{labels.logmsg}}”
    then the alert does fire - but the logmsg label in Alertmanager is empty and not the actual log content.

From what I can tell, this seems to be because label_format doesn’t evaluate templates at all, and any {{ }} that makes it into LogQL is treated as a literal string (or causes the alert to stop firing). That leaves me unable to materialise the extracted logmsg value in a way that survives alert evaluation.

At this point, even with logmsg included in the aggregation, I still can’t get the actual log message into the alert, so the original problem remains unresolved. If there’s a supported pattern to reference the captured regex label without using label_format (or a known limitation here), I’d really appreciate clarification.

Thanks again for the guidance

Can you share a screenshot of the query result from logcli or grafana explore, with logmsg in the aggregation, please?

Please find the requested screenshots

Maybe try this:

- name: node-log-errors
   rules:
   - alert: MyAlertForNode
     expr: sum by (node, logmsg) (count_over_time({job="systemd-journal"} |= "err" | regexp "(?P<logmsg>.+)" | label_format logmsg=“{{ `{{ .logmsg }}` }}”[1m]) ) > 0
     for: 30s
     labels:
        severity: critical
        node: '{{ `{{ $labels.node }}` }}'
        error_summary: '{{ `{{ $labels.logmsg }}` }}'
     annotations:
        summary: "Error log detected on node {{ `{{ $labels.node }}` }}"
        description: |
             Log message: {{ `{{ $labels.logmsg }}` }}
             Please investigate this issue on node {{ `{{ $labels.node }}` }}

I re-read your previous comment, and wanted to point this out and see if this is perhaps the issue.

In your actual query for your alert, you need to use {{ .logmsg }} with label_format, because that’s Loki query, it needs to follow the query syntax. You want to use {{ $labels.logmsg }} in other ruler configuration such as error summary and description.

If this still doesn’t work, I’d check on ruler logs and your alerting destination and see if there is any log there as well.

I tried using logmsg="{{ `{{ .logmsg }}` }}" as suggested, but not receiving any alerts in the alertmanager. This could be because the expression might not be evaluated to True in real time, though the syntax is correct.

Following is the loki-ruler pod logs for your reference
level=info ts=2026-01-09T04:24:59.557351713Z caller=compat.go:68 user=fake rule_name=MyAlertForNode rule_type=alerting query=“(sum by (node,logmsg)(count_over_time({job="systemd-journal"} |= "err" | regexp "(?P.+)" | label_format logmsg="{{ .logmsg }}"[1m])) > 0)” query_hash=217285099 msg=“evaluating rule”
level=info ts=2026-01-09T04:24:59.575875665Z caller=evaluator_local.go:67 msg=“request timings” insight=true source=loki_ruler rule_name=MyAlertForNode rule_type=alerting total=0.017605119 total_bytes=5955547 query_hash=217285099
level=info ts=2026-01-09T04:25:59.557897619Z caller=compat.go:68 user=fake rule_name=MyAlertForNode rule_type=alerting query=“(sum by (node,logmsg)(count_over_time({job="systemd-journal"} |= "err" | regexp "(?P.+)" | label_format logmsg="{{ .logmsg }}"[1m])) > 0)” query_hash=217285099 msg=“evaluating rule”