Unable to extract info from __path__ and unable to get loki to use the log timestamp

Hi,

I am using promtail to scrape logs and I want to get the client (name), server, service and log file name from the path.
Then I want based on the service, extract the timestamp from the log so that loki doesnt load data using the current time stamp.

This is my config in promtail, something wrong?

scrape_configs:
  - job_name: system
    static_configs:
    - targets:
      - Europa
      labels:
        job: system-logs
        __path__: /var/log/*log
  - job_name: 'client-logs'
    static_configs:
    - targets:
      - Europa
      labels:
        job: 'client-logs'
        __path__: /data/ingester/*/*/*/*  # Match all log files, adjust path as needed
    relabel_configs:
      - source_labels: ["__path__"]
        regex: /data/ingester/([^/]+)/([^/]+)/([^/]+)/(.*)
        replacement: $1
        target_label: client
        action: replace
      - source_labels: ["__path__"]
        regex: /data/ingester/([^/]+)/([^/]+)/([^/]+)/(.*)
        replacement: $2
        target_label: server
        action: replace
      - source_labels: ["__path__"]
        regex: /data/ingester/([^/]+)/([^/]+)/([^/]+)/(.*)
        replacement: $3
        target_label: service
        action: replace
      - source_labels: ["__path__"]
        regex: /data/ingester/([^/]+)/([^/]+)/([^/]+)/(.*)
        replacement: $4
        target_label: logfile
        action: replace
    pipeline_stages:
      - match:
          selector: '{logfile=~"_gc"}'
          stages:
            - regex:
                expression: '^\[(?P<time>\d{4}-\d{2}-\d{2}T\d{2}:\d{2}:\d{2}.\S+[-+]\d{4})\]\[\S+\]\[\S+\]\[(?P<level>\S+)\].+'
            - labels:
                time:
                level:
            - timestamp:
                source: time
                format: "2006-01-02T15:04:05.000-0700"
      - match:
          selector: '{logfile=~"coref"}'
          stages:
            - regex:
                expression: '^(?P<time>\d{4}-\d{2}-\d{2} \d{2}:\d{2}:\d{2},\S+) - (?P<level>\S+).+'
            - labels:
                time:
                level:
            - timestamp:
                source: time
                format: "2006-01-02 15:04:05,000"
      - match:
          selector: '{service=~"service|web|engine|integration|user"}'
          stages:
            - regex:
                expression: '^(?P<time>\d{4}-\d{2}-\d{2} \d{2}:\d{2}:\d{2}.\S+)  (?P<level>\S+).+'
            - labels:
                time:
                level:
            - timestamp:
                source: time
                format: "2006-01-02 15:04:05,000"
      - match:
          selector: '{logfile=~"mysqld"}'
          stages:
            - regex:
                expression: '^(?P<time>\d{4}-\d{2}-\d{2}T\d{2}:\d{2}:\d{2}.\d{6}Z) \d+ \[(?P<level>\S+)\].+'
            - labels:
                time:
                level:
            - timestamp:
                source: time
                format: "2006-01-02T15:04:05.000000Z"
      - match:
          selector: '{logfile=~"haproxy"}'
          stages:
            - regex:
                expression: '^(?P<time>\S+\s+\d{1,2}\s\d{2}:\d{2}:\d{2}).+'
            - labels:
                time:
                level:
            - timestamp:
                source: time
                format: "Jan  9 03:10:03"
      - match:
          selector: '{logfile="messages"}'
          stages:
            - regex:
                expression: '^(?P<time>\S+\s+\d{1,2}\s\d{2}:\d{2}:\d{2}).+'
            - labels:
                time:
                level:
            - timestamp:
                source: time
                format: "Jan  9 03:10:03"

I hope you can find something wrong in the config. I dont see an error in promtail log, it just send data to loki without taking care of the timestamp and none the labels i defined are used.

Here is an example with mysql log. Logs are loaded but the time is the received time, not the time from the log itself. Is it possible to make promtail send the log timestamp to loki/grafana?

Thanks for your help in advance.

I tried some workaround I found on the forum by using the pipeline_stages but no success.

static_configs:
  - targets:
    - Europa
  - labels:
      job: 'client-logs'
      __path__: /data/ingester/*/*/*/*  # Match all log files, adjust path as needed
pipeline_stages:
  - match:
      selector: '{job="client-logs"}'
      stages:
        - regex:
            source: filename
            expression: '/data/ingester/(?P<filename>.+)'
        - labels:
            filename:

This is what I get on doing a dry-run. There is no “filename”, only path and it is not expanded, so obviously this is not working in any way.

2025-02-15T08:45:20.134277718+0000 {path=“/data/ingester/*/*/*/*”, job=“client-logs”} 2023-08-30 08:44:41.743 WARN 2059236 — [https-jsse-nio-4601-exec-7] n.i.common.web.api.ApiResponse : ErrorResponse(code=BadRequest, message=Bad Request, requestId=00eb9e03-0e9b-4d83-9581-e1dc6299d22a, fieldErrors=null, fieldArrayErrors=null)
2025-02-15T08:45:20.134280507+0000 {path=“/data/ingester/*/*/*/*”, job=“client-logs”} io.grpc.StatusRuntimeException: UNKNOWN

if I change source: filename to source: path, then I get “*” as filename.

[inspect: regex stage]:
{stages.Entry}.Extracted["filename"]:
	+: */*/*/*
[inspect: regex stage]:
{stages.Entry}.Extracted["filename"]:
	+: */*/*/*
[inspect: labels stage]:
{stages.Entry}.Entry.Labels:
	-: {__path__="/data/ingester/*/*/*/*", job="client-logs"}
	+: {__path__="/data/ingester/*/*/*/*", filename="*/*/*/*", job="client-logs"}

Please let me know how to extract information from the file path.

I finally gave up and used file_sd_configs which let us set labels and specify the file path directly and so the filename can be extracted.

file_sd_configs:
  - files:
      - /data/promtail/log_files.yml  # File containing dynamically updated log paths
    refresh_interval: 10s