Promtail regex not match with ANSI color

I am using regular expressions to process my logs.

My promtail config is:

- job_name: kubernetes-pods
  pipeline_stages:
    - docker: {}
    - regex:
         expression: '^(?P<timestamp>[0-9 :.-]+?) \| \\u001b\[0;00;[0-9]+m(?P<logLevel>[DIWEF])\\u001b\[0m \| (?P<traceID>.*) \| (?P<location>[^\s]+:\d+) \\u001b\[0;00;[0-9]+m(?P<message>.+)\\u001b\[0m.*$'
    - labels:
        timestamp: timestamp
        logLevel: logLevel
        traceID: traceID
        location: location
        message: message

My log is like:

09-04 18:22:07.415 | \u001b[0;00;32mI\u001b[0m |  | handler.go:383 \u001b[0;00;32mget all robot stop plan info\u001b[0m\n

I got the error message from promtail debug log:

level=debug ts=2023-09-06T01:27:27.313448004Z caller=regex.go:121 component=file_pipeline component=stage type=regex msg="regex did not match" input="09-04 18:22:07.415 | \u001b[0;00;32mI\u001b[0m |  | handler.go:383 \u001b[0;00;32mget all robot stop plan info\u001b[0m\n" regex="^(?P<timestamp>[0-9 :.-]+?) \\| \\\\u001b\\[0;00;[0-9]+m(?P<logLevel>[DIWEF])\\\\u001b\\[0m \\| (?P<traceID>.*) \\| (?P<location>[^\\s]+:\\d+) \\\\u001b\\[0;00;[0-9]+m(?P<message>.+)\\\\u001b\\[0m.*$"

But I tested the regex in Go and it was correct.

package main

import (
	"fmt"
	"regexp"
)

func main() {
	logLine := `09-04 18:22:07.415 | \u001b[0;00;32mI\u001b[0m |  | handler.go:383 \u001b[0;00;32mget all robot stop plan info\u001b[0m\n`
	re := regexp.MustCompile(`^(?P<timestamp>[0-9 :.-]+?) \| \\u001b\[0;00;[0-9]+m(?P<logLevel>[DIWEF])\\u001b\[0m \| (?P<traceID>.*) \| (?P<location>[^\s]+:\d+) \\u001b\[0;00;[0-9]+m(?P<message>.+)\\u001b\[0m.*$`)
	match := re.FindStringSubmatch(logLine)

	if match != nil {
		groups := make(map[string]string)

		// 提取捕获组
		for i, name := range re.SubexpNames() {
			if name != "" {
				groups[name] = match[i]
			}
		}

		fmt.Printf("Timestamp: %s\n", groups["timestamp"])
		fmt.Printf("Log Level: %s\n", groups["logLevel"])
		fmt.Printf("Trace ID: %s\n", groups["traceID"])
		fmt.Printf("Location: %s\n", groups["location"])
		fmt.Printf("Message: %s\n", groups["message"])
	} else {
		fmt.Println("Log line does not match the expected format.")
	}
}

output:

Timestamp: 09-04 18:22:07.415
Log Level: I
Trace ID: 
Location: handler.go:383
Message: get all robot stop plan info

Who can tell me where the problem is? THX!

Docker logs are usually in json format, remove the regex part so you can see the actual logs and see if you need to apply json filter to it first.

As to your regex, I did a quick test with your config and logline, it seems to be working. I am using promtail 2.8.4, with config:

scrape_configs:
- job_name: system
  static_configs:
  - targets:
      - localhost
    labels:
      job: varlogs
      __path__: /tmp/test.log
  pipeline_stages:
    - regex:
        expression: '^(?P<timestamp>[0-9 :.-]+?) \| \\u001b\[0;00;[0-9]+m(?P<logLevel>[DIWEF])\\u001b\[0m \| (?P<traceID>.*) \| (?P<location>[^\s]+:\d+) \\u001b\[0;00;[0-9]+m(?P<message>.+)\\u001b\[0m.*$'
    - labels:
        timestamp: timestamp
        logLevel: logLevel
        traceID: traceID
        location: location
        message: message

Debug output:

level=info ts=2023-09-06T16:55:33.072740561Z caller=main.go:174 msg="Starting Promtail" version="(version=2.8.4, branch=HEAD, revision=89d282c43)"
level=warn ts=2023-09-06T16:55:33.072859494Z caller=promtail.go:265 msg="enable watchConfig"
2023-09-06T16:55:38.073953326+0000	{filename="/tmp/test.log", job="varlogs", location="handler.go:383", logLevel="I", message="get all robot stop plan info", timestamp="09-04 18:22:07.415", traceID=""}	09-04 18:22:07.415 | \u001b[0;00;32mI\u001b[0m |  | handler.go:383 \u001b[0;00;32mget all robot stop plan info\u001b[0m\n
level=info ts=2023-09-06T16:55:38.073663056Z caller=filetargetmanager.go:358 msg="Adding target" key="/tmp/test.log:{job=\"varlogs\"}"

This topic was automatically closed 365 days after the last reply. New replies are no longer allowed.