Loki/Promtail: Parse timestamp in pipeline doesn't work

Hi everyone,

I’m evaluating Loki for analyzing log files right now and have problems parsing the datetime that’s part of an apache access log line.

The log line looks something like that (ip address changed):

10.11.12.0 - - [28/Jan/2020:00:27:21 +0100] "GET / HTTP/1.1" 200 2963 "-" "worldping-api"

My regex looks like this:

- regex:
      expression: '(?P<clientAddr>\w{1,3}.\w{1,3}.\w{1,3}.\w{1,3}) - - .(?P<timestamp>\d{2}.*\d{4}). "(?P<request>.*)" (?P<code>\d+) .* "(?P<referrer>.*)" "(?P<client>.*)"'

This seems to work; when setting up labels for all those matches they show up in Grafana explorer, and the timestamp field contains 28/Jan/2020:18:29:22 +0100.

But Promtail doesn’t seem to parse the datetime correctly, since in Grafana each log entry shows up with the time of when it was recorded (which doesn’t correlate to when the request was made, since the logs are written to the file with a delay).

I setup a timestamp parser like this:

- timestamp:
      source: timestamp
      format: "02/Jan/2006:03:04:05 -0700"

I don’t have much Go experience, but tried to setup the format string according to the time.Parse docs and validate it with http://gotime.agardner.me/parser.html – according to this, the format string should be allright.

Can somebody give me a hint about why this could not work and how to improve it?

Best,
Benjamin.

For reference, my whole pipeline stage looks like this:

  pipeline_stages:
  - match: 
      selector: '{filename="/home/alu/logs/access_log"}'
      stages:
      - regex:
          expression: '(?P<clientAddr>\w{1,3}.\w{1,3}.\w{1,3}.\w{1,3}) - - .(?P<timestamp>\d{2}.*\d{4}). "(?P<request>.*)" (?P<code>\d+) .* "(?P<referrer>.*)" "(?P<client>.*)"'
      - labels:
          clientAddr:
          request:
          code:
          referrer:
          client:
          timestamp:
      - timestamp:
          source: timestamp
          format: "02/Jan/2006:03:04:05 -0700"
1 Like

Hey all,

sooo I figured it out in the mean time. It turns out that 03 in the time format string references 12h (am/pm) format, whereas my logfile contains the hour of the day in 24h format. Thus the format string to parse the apache access log datetime has to look like this: 02/Jan/2006:15:04:05 -0700 (notice the 15 instead of 03).

I was able to debug this by running promtail with -log.level=debug (and a very short demo log file), which I read about here: https://github.com/grafana/loki/issues/1330#issuecomment-559713373

Thanks to all contributors for this nice piece of software!

Best,
Benjamin.

2 Likes

I have a log line like this

1.1.1.1 - - [23/Jul/2022:14:24:00 +0200] "GET /app/login?return=x HTTP/1.0" 200 2975 "-" "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/103.0.0.0 Safari/537.36" **0/177102**

And promtail setting like

scrape_configs:
  - job_name: system
    pipeline_stages:   
    - regex:
        expression: '^(?P<ip>\S+) (?P<identd>\S+) (?P<user>\S+) \[(?P<timestamp>[\w:/]+\s[+\-]\d{4})\] \"?(?P<action>\S+)\s?(?P<path>\S+)?\s?(?P<protocol>\S+)?\" (?P<status_code>\d{3}|-) (?P<size>\d+|-)\s?\"?(?P<referer>[^\"]*)\"?\s?\"?(?P<useragent>[^\"]*)?\" ?(?P<reponsetime>\S+)?'   
    - labels:
        action:
        status_code:
        date:
        time:
        ip:
        timestamp:           
    - timestamp:
        source: timestamp
        format: "02/Jan/2006:15:04:05 -0700"
    static_configs:
    - targets:
      - localhost
      labels:
       job: apache
       env: dev
       __path__: /var/log/access_ssl_log.processed*

This should parse the timestamp correctly. But when i go into Grafana there is no data, even if i select the whole Year. If i remove the timestamp and let it set the timestamp the time of scraping then in grafana i can see the data but it’s timestmap is the time of the parsing and not the time of the page visit (this is apache log)

1 Like

Me to facing same issue let me know if you get solution