Parsing timestamp from logline with promtail and sending to Loki

Hi there,

I am new to Grafana/Loki/Promtail and I am having a hard time for two months now getting something to work, which I assumed after going through the docs, reading though community here and even trying to have ChatGPT help me out would be an easy task.

I have log files with log lines like this:

2023/04/05 18:18:06 INFO : FilterImages/Img_011X5_2023_02_25-13_32_19.tiff.json: Copied (new)
2023/04/05 18:18:06 INFO : FilterImages/Img_011X5_2023_02_25-13_32_19.tiff.json: Deleted
2023/04/05 18:18:06 INFO : FilterImages/Img_011XH_2023_02_26-10_12_03.tiff.json: Copied (new)
2023/04/05 18:18:06 INFO : FilterImages/Img_011XH_2023_02_26-10_12_03.tiff.json: Deleted

This is the timestamp format in the log file: 2023/04/05 18:18:06

I use regex to capture it:
expression: ‘^(?P\d{4}/\d{2}/\d{2} \d{2}:\d{2}:\d{2}) (?PERROR|INFO) : (?P.*)$’

Initially I started with a simple timestamp stage:

  • timestamp:
    source: timestamp
    format: “2006/01/02 15:04:05”

Result was, that there were no results in the query any more…not even any labels made it into Grafana.

Many many hours and iterations later I decided to do the reformatting to Loki’s default timestamp myself hoping for better results.

This is my current promtail config:

server:
http_listen_port: 9080
grpc_listen_port: 0
log_level: debug

positions:
filename: C:\Grafana\LokiPromTrail\PromTailPositions\positions.yaml

clients:

scrape_configs:

  • job_name: rclone
    static_configs:
    • targets:
      • localhost
        labels:
        path: C:/Grafana/LokiPromTrail/rctest.log
        job: rclone
        versioncounter: 55
        pipeline_stages:
    • regex:
      expression: ‘^(?P\d{4}/\d{2}/\d{2} \d{2}:\d{2}:\d{2}) (?PERROR|INFO) : (?P.*)$’
    • template:
      source: extractedtimestamp
      #from 2006/01/02 15:04:05 to 2006-01-02 15:04:05
      template: ‘{{ Replace .Value “/” “-” 2 }}’
    • template:
      source: extractedtimestamp
      #from 2006-01-02T15:04:05 to 2006-01-02T15:04:05-07:00
      template: ‘{{ Replace .Value " " “T” 1 }}+02:00’
    • labels:
      reformattedtimestamp: extractedtimestamp
      level: level

      message: message

    • timestamp:
      source: extractedtimestamp
      format: RFC3339

      inital easy setup format: “2006/01/02 15:04:05”

As you can see I added template stages to transform the extracted timestamp to the RFC339 format. I also put the result into a label just so that I could see it in Grafana.

I am running promtail in debug level:
level=debug ts=2023-04-08T08:45:55.2500749Z caller=regex.go:132 component=file_pipeline component=stage type=regex msg=“extracted data debug in regex stage” extracteddata=“map[extractedtimestamp:2023/04/05 19:30:16 filename:C:/Grafana/LokiPromTrail/rctest.log job:rclone level:INFO message:BlemishImages/Blemish_LogFile_01171_2023_02_09-12_50_38.json: Copied (new) versioncounter:55]”

What I get in Grafana when doing a query is this: I see the labels and the content of the labels. The reformattedtimestamp label does show all the timestamps of the log lines.

I have changed the defaut loki config and added this:
limits_config:
reject_old_samples: false
reject_old_samples_max_age: 4w

I use the time range in the Grafana Explorer as now-7d to now.

But I get no results from a query whatsoever.

Removing the timestamp stage lead to perfect results (except the timestamp of course).

I am running on Windows with the latest versions of Grafana/Loki/Promtail available in Februar.

Any ideas where to look at?

A couple of things catch my eye.

Your regex doesn’t use named capture groups and you are matching more than the timestamp (I see the log level in there) this will fail at your timestamp stage because you have more than just the timestamp being matched to your parse expression (the log level is in there too)

You could probably remove anything but the timestamp info from the regex or use named capture groups. The docs have some examples regex | Grafana Loki documentation

The second issue you might have is your timestamp doesn’t have time zone info in it, you should explicitly set the time zone in the timestamp stage to make sure you are parsing the timestamp with the correct time zone.

Also check out the troubleshooting section, you can test your changes more easily with promtail locally to see how the stages are being applied Troubleshooting | Grafana Loki documentation

2 Likes

What @ewelch said.

Generally parsing timestamp comes in two stages if you are using regex:

  1. Regex group capture the timestamp part.
  2. Use the captured group as source.

In your example, it should be something like this (not tested):

- regex:
    expression: '^(?P<timestamp>\d{4}\/\d{2}\/\d{2} \d{2}:\d{2}:\d{2}) .*$'
- timestamp:
    source: timestamp
    format: '2006/01/02 15:04:05'

Thanks guys for taking the time to help me out.

I has set log_level: debug in promtail config and thought that would be it, but the --inspect flag really helped me tremendously. What a pitty I hadnt seen that before in the docs. Does provide great insight.

Turns out, it is not really a timestamp issue.

I was first confused into believing that is was timestamp related cause I saw this error message "level=debug ts=2023-04-11T08:18:38.2951782Z caller=timestamp.go:176 component=file_pipeline msg="extracted data did not contain a timestamp""

But that was due to a couple of corrupted lines 2023/04/11 9:37:46 INFO where the “0” was missing before the 9.

After running with the --inspect flag I could indeed see that all time timestamp parsing was indeed delivering the correct timestamp. That is true for the version where I converted the timestamp with tempalte stages to RFC339 as well in using the leaner version with just using the timestamp stage.

config with template stage converting:

server:
  http_listen_port: 9080
  grpc_listen_port: 0
  log_level: debug

positions:
  filename: C:\Grafana\DataStore\Promtail\positions.yaml

clients:
  - url: http://localhost:3100/loki/api/v1/push

scrape_configs:
- job_name: rclone
  static_configs:
  - targets:
      - localhost
    labels:
      __path__: C:/Grafana/LogFilesTest/rctest.log
      job: rclone
      versioncounter: 57
  pipeline_stages:
  - regex:
       expression: '^(?P<extractedtimestamp>\d{4}/\d{2}/\d{2} \d{2}:\d{2}:\d{2}) (?P<level>ERROR|INFO)  : (?P<message>.*)$'
  - template:
      source: extractedtimestamp
      #from 2006/01/02 15:04:05 to 2006-01-02 15:04:05
      template: '{{ Replace .Value "/" "-" 2 }}'
  - template:
      source: extractedtimestamp
      #from 2006-01-02T15:04:05 to 2006-01-02T15:04:05-07:00
      template: '{{ Replace .Value " " "T" 1 }}+02:00'
  - labels:
      reformattedtimestamp: extractedtimestamp
      level: level
      message: message
  - timestamp:
      source: extractedtimestamp
      format: RFC3339

config with simply using timezone stage:


server:
  http_listen_port: 9080
  grpc_listen_port: 0
  log_level: debug

positions:
  filename: C:\Grafana\DataStore\Promtail\positions.yaml

clients:
  - url: http://localhost:3100/loki/api/v1/push

scrape_configs:
- job_name: rclone
  static_configs:
  - targets:
      - localhost
    labels:
      __path__: C:/Grafana/LogFilesTest/rctest.log
      job: rclone
      versioncounter: 58
  pipeline_stages:
  - regex:
       expression: '^(?P<extractedtimestamp>\d{4}/\d{2}/\d{2} \d{2}:\d{2}:\d{2}) (?P<level>ERROR|INFO)  : (?P<message>.*)$'
  - labels:
      reformattedtimestamp: extractedtimestamp
      level: level
      message: message
  - timestamp:
      source: extractedtimestamp
      format: "2006/01/02 15:04:05"
      location: Europe/Berlin

Additionally I figured to use the ‘preformatted text’ option when pasting code here. My first config paste surprisingly removed the tags for the capture groups.

Side note: I will remove the ‘message’ part now after having read the best practices on use of labels.

So I am very happy now having solved this riddle and now being able to do the real stuff in Grafana.

There is one thing though that I still need to solve: How far into the past the log lines are accepted. In my use case the content / log files a produced every night, but the logs are only available to processing next day.

I ran with --inspect flag and I was manually adding lines to the log file to do some testing. I figured that only log lines around something like two and a half hours back from now make it into being available to query in Grafana.

I did add limits_config to loki:

limits_config:
  reject_old_samples: false
  reject_old_samples_max_age: 4w

Is there somewhere else I need to make an adjustment to accepts logline for the past 24h?

Thanks
Peter

Hey Peter, did you ever find a way to make adjustments to accept loglines past 24h to a specified time?