Promtail stages docker and multiline

jortk · February 11, 2022, 2:57pm

Hello

Thanks for any help and feedback in advance .

Objective/Intro
I’m trying to achieve multiline logging on a container (docker) based installation (kubernetes cluster) using loki and promtail through helm charts. My solution is somewhat working, except that it does not handle multiline messages which are split by hitting max_lines. But I have to admit that my current setup is partially a workaround for other issues related to how docker json logging is handled in combination with multiline processing in promtail.

Implementation
I have started from the documentation by having (some) applications inserting the &ZeroWidthSpace character into the logs.
Example partial logs (java stack trace), as the basis of promtail processing.

{"log":"\u0026ZeroWidthSpace;2022-02-11 09:10:47.352 ERROR 1 --- [nio-8080-exec-2] o.a.c.c.C.[.[.[/].[dispatcherServlet]    : Servlet.service() for servlet [dispatcherServlet] in context with path [] threw exception [Request processing failed; nested exception is java.time.format.DateTimeParseException: Text 'intentionallybreakingtimestamp' could not be parsed at index 0] with root cause\n","stream":"stdout","time":"2022-02-11T09:10:47.353602146Z"}
{"log":"\n","stream":"stdout","time":"2022-02-11T09:10:47.353639043Z"}
{"log":"java.time.format.DateTimeParseException: Text 'intentionallybreakingtimestamp' could not be parsed at index 0\n","stream":"stdout","time":"2022-02-11T09:10:47.353645616Z"}
{"log":"\u0009at java.base/java.time.format.DateTimeFormatter.parseResolved0(DateTimeFormatter.java:2046) ~[na:na]\n","stream":"stdout","time":"2022-02-11T09:10:47.353652811Z"}
{"log":"\u0009at java.base/java.time.format.DateTimeFormatter.parse(DateTimeFormatter.java:1948) ~[na:na]\n","stream":"stdout","time":"2022-02-11T09:10:47.353659141Z"}
{"log":"\u0009at java.base/java.time.ZonedDateTime.parse(ZonedDateTime.java:598) ~[na:na]\n","stream":"stdout","time":"2022-02-11T09:10:47.35366469Z"}
{"log":"\u0009at java.base/java.time.ZonedDateTime.parse(ZonedDateTime.java:583) ~[na:na]\n","stream":"stdout","time":"2022-02-11T09:10:47.353670223Z"}

These are the stages I have come up with;

      - docker: {}
      - multiline:
        # Use this special stage to combine specifically tagged lines into one multiline message
          firstline: '^&ZeroWidthSpace;'
      - regex:
        # This regex adds the 'multiline' content to the extracted map
          expression: '^(?s)(?P<multiline>&ZeroWidthSpace;)'
      - labels:
        # Now label messages that have a 'multiline' in their extracted map
          multiline:
      - match:
        # Only filter messages that are multiline labelled
          selector: '{multiline="&ZeroWidthSpace;"}'
          stages:
            - replace:
              # Clean up zerospacewidth
                expression: '^(?s)(?P<zerowidthspace>&ZeroWidthSpace;)'
                replace: ''
            - replace:
              # Remove empty lines
                expression: '(?m)(?P<emptyline>^\s*\n)'
                replace: ''

Considerations/Issues

While working on this, this was basically my thought process;

It seems required to have the docker{} stage first, as it handles all the json conversion and makes the initial extracted map and labels (stream, timestamp). Flipping this around with multistage does not seem possible as the docker stage won’t handle the character conversion of a multiline message well.
Because of the docker/json handling, unfortunately the &ZeroWidthSpace character is not invisible and therefore cannot be used by the multistage with regex ^\x{200B}
Each original and output message from the docker{} stage already has a newline (\n).
multiline stage inserts additional newlines, which seem to come from here; loki/multiline.go at main · grafana/loki · GitHub
This results in multiline messages which start with a &ZeroWidthSpace character and have an empty line every other line.
Given that the &ZeroWidthSpace character is not invisible and I have these empty lines, I then go on to try to regex the multiline messages, apply the label and clean those up. However, because also splitting can occur on the multiline stage, by the max_lines parameter (default 128), any overflow of this cannot be captured and ends up with empty lines in the middle. An additional side-effect is that original empty lines (see example line 2) are also removed.

Help/Ideas?
Given the above, can you help me achieve the objective of robust multiline logging without the various described issues ?
One feature request that might resolve this, is if the multiline stage could already populate an extracted map or label, including on any split messages (by the max_lines parameter).

jortk · February 17, 2022, 1:26pm

Any ideas, anyone? Basically the path I’ve taken thus far leads to 2 issues;

Long messages (>192 lines) are split, with all messages except the first having empty lines, every other line.
I lose original empty lines logged by applications, as I can’t distinguish between original ones and the empty lines produced by the multiline stage.

I welcome any feedback !

system · February 17, 2023, 1:26pm

This topic was automatically closed 365 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Unable to get multiline to work with Log4j logs Grafana Loki loki	1	849	September 3, 2022
Multiline feature configuration question Grafana Loki	3	1448	April 1, 2022
Batch importing logs with different formats Grafana Loki promtail , logs	1	328	January 14, 2025
Need assistance recognizing timestamp from logs Grafana Loki	3	2151	March 9, 2023
Promtail not parsing logs Grafana Loki promtail	4	1857	February 3, 2024

Promtail stages docker and multiline

Related topics