Grouping multiline stack traces together as one and correct labeling

turtle_man · January 27, 2025, 8:47pm

I am using grafana alloy to fetch,process and send logs to loki. The problem comes when there is a traceback stack trace that is separated by the new line and it treats all lines as separate log message, I have multiple custom formats being used and cannot rely on a single type.so i have grouped them using:

 stage.multiline {
    firstline    = "(?i)^Traceback \\(most recent call last\\):"  // Define the first line of the traceback
    max_wait_time = "1s"  // Wait for additional lines to complete the group
  }

  // Step 2: Match logs starting with "Traceback (most recent call last):"
  stage.match {
    selector = "{msg =~ \"(?i)^Traceback \\(most recent call last\\):\"}"  // Filter logs that match the pattern
      stage.labels {
        values = {
          "traceback" = "traceback",  // Add the label "traceback = traceback"
        }
      }
  }

pthey are mostly correctly grouping the tracebacks, problem arise with the labeling, doesnt seem to work, have tried multiple different stages in loki.process, if anyone can provide any insight would be great?

tonyswumac · January 28, 2025, 6:48pm

Do you actually have a label called msg?

turtle_man · January 28, 2025, 11:07pm

no, but i have ditched this config and have moved on too this(I saw you suggested this on another post:

stage.multiline {
firstline = “(?i)^Traceback \(most recent call last\):” // Start grouping on “Traceback (most recent call last):”
max_wait_time = “1s” // Max wait time for multiline grouping
}

// Step 2: Match the first line (Traceback line) and label as ‘error’
stage.regex {
expression = “(?i)^(?PTraceback \(most recent call last\):).*” // Match Traceback start
}

// Step 3: Process and clean the level (if needed) to lowercase
stage.template {
source = “level_cleansed”
template = “{{ .level | ToLower }}” // Ensure level is processed as lowercase
}

// Step 4: Label the logs as ‘traceback’ if they start with a Traceback line
stage.labels {
values = {
“level” = “level_cleansed”,
}
}

this works on the labeling , but on checking the tracebacks some are not fully joined together…
there will be a last line missing…
and currently this is not showing the error name, which is a line before the traceback…and is being categorized as an error rather than traceback…any suggestions for adjoining the error with the traceback?

tonyswumac · January 29, 2025, 1:15am

Perhaps try changing your max_time to be slightly bigger? Also can you share a sample of your logs? Both normal and stack traces.

With multiline you need something to match as start of line. This is typically a timestamp, because that’s easily matched. For example:

2025-01-28T12:00:00Z normal log
2025-01-28T12:00:00Z normal log
2025-01-28T12:00:00Z Trace
    somethingsomething
    somethingsomething
2025-01-28T12:00:00Z normal log

This is clear and easy to match. But if you don’t have that, in your example if we were using just string Traceback (most recent call last):, then consider:

normal log
normal log
Traceback (most recent call last):
    somethingsomething
    somethingsomething
normal log

Multiline has no way to determine that last normal log is normal, because it cannot match the start of the multiline, so it relies purely on your max_wait time, which is needlessly to say unreliable and prone to errors.

turtle_man · January 29, 2025, 2:01am

So:
1)2025-01-29T01:41:02.309034003Z stdout F" all lines start like this,
2)even the traceback lines are treated as new lines, adding this format of CRI timestamp and “stdout/stderr” and “F” to them.
3)Also the issue is I have multiple custom log formats(which is why i asked about multiple multiline processes working)
4)and these formats have another application level timestamp field ahead of the regular timestamp sterr F.
5)currently have clipped the CRI level timestamp so that Tracebacks can be grouped easily…but still not 100%
here is an example of a traceback
1.(Timestamp) (level some sort of error)
2.Traceback (most recent call last):
3.File “/app/elastic_search/filters.py”, line 331, in search_queries
4.new_resp = _resp[0]
5.IndexError: list index out of range

This is how it comes after i have tripped away the CRI level timestamp, STD info and F.
so now line number 1 will be its separate error line, and the traceback will start from line 2 till 5 and in some cases it skips over line 5 and goes to another non error line(im assuming due to cutting of the timestamp portion, however all logs seem to be coming in the correct chronological order.
what I can understand is that the best way for me to have no issues is put in all possible custom log formats that I am using so the traceback can be simply grouped together with the error line…s is it possible to do multiple stage.multiline? i tried testing it but it worked in some cases and others not so much, bringing me to the conclusion that it did not work.

turtle_man · January 29, 2025, 2:07am

Or a second way seems to be to renable the CRI based timestamp but somehow strip it from only the traceback lines…not sure how I can form a regex of that sorts…

tonyswumac · January 30, 2025, 3:34pm

I would recommend you to keep the timestamp and use that as start of multiline.

Also, if you really want to strip the timestamp, you can always do that after multiline.

turtle_man · January 30, 2025, 9:35pm

Had a light bulb moment when writing my previous comments…i kept the timestamps, removed them from any lines that start with double space or traceback, or file, or somethingERROR and did multiline based off of cri timestamps…works,

Topic		Replies	Views
Multiline feature configuration question Grafana Loki	3	1440	April 1, 2022
Loki Trace to Logs Multiline Not shown in Grafana Grafana Loki	1	733	August 25, 2022
Display stacktrace/multiline label in grafana log browser Grafana loki	3	1963	August 19, 2022
Multiline stage duplicating and wrongly sorting log lines Grafana Alloy loki , agent , promtail , alloy	2	784	July 17, 2024
Multiline Python tracebacks in formatted log file? Grafana Loki promtail	2	1700	November 21, 2024

Grouping multiline stack traces together as one and correct labeling

Related topics