Promtail k8s add nginxlog parsing

Hi I have deployed the promtail helm chart and I am ingesting logs, but many of my apps are react apps running nginx but my logs are not being parsed properly.
here is an example from Grafana

I have found this config which properly parses the nginx config

 pipeline_stages:
  - match:
      selector: '{job="nginx"}'
      stages:
      - regex:
          expression: '^(?P<remote_addr>[\w\.]+) - (?P<remote_user>[^ ]*) \[(?P<time_local>.*)\] "(?P<method>[^ ]*) (?P<request>[^ ]*) (?P<protocol>[^ ]*)" (?P<status>[\d]+) (?P<body_bytes_sent>[\d]+) "(?P<http_referer>[^"]*)" "(?P<http_user_agent>[^"]*)"?'
      - labels:
          remote_addr:
          remote_user:
          time_local:
          method:
          request:
          protocol:
          status:
          body_bytes_sent:
          http_referer:
          http_user_agent:

which outputs the following log when i test it

2021-06-29T13:25:31     {__path__="/var/log/nginx/*log", body_bytes_sent="2595", host="xx", http_referer="-", http_user_agent="kube-probe/1.19+", job="nginx", method="GET", protocol="HTTP/1.1", remote_addr="xx.xx.xx.xx", remote_user="-", request="/", status="200", time_local="29/Jun/2021:17:21:45 +0000"}172.xx.xx.xx - - [29/Jun/2021:17:21:45 +0000] "GET / HTTP/1.1" 200 2595 "-" "kube-probe/1.19+" "-"

My Problem is with all the relabeling that takes place in the promtail helm chart how do I identify the containers that i want to apply this nginx config for ?
Thanks
-rob

Hi @rreilly

So my understanding is you have multiple containers running in kubernetes and some of them are nginx containers and you want to apply above parsing only to nginx containers correct?
And you use kubernetes service discovery for scraping the logs via promtail.

So I would recommend two things.

  1. Use relabel_configs to have container label available.
    e.g:
relabel_configs:
- action: replace
  source_labels:
      - __meta_kubernetes_pod_container_name
  target_label: container

now you have container label available to use in pipeline_stages match

  1. In your above pipeline_stages, have match.selector: '{container="nginx"}' or something similar based on your nginx container name.

Now it should parses properly only for nginx containers.

Does this answer your question?

I tried this and it worked as expected but it seems to go very much against the basic principles of Loki, which is having a minimal index. The stream count exploded and I only enabled this on one small test Kubernetes cluster…

For me, it looks like changing the Nginx log format will be a better approach.

This is what I have done so far. We use this Nginx ingress controller (I think there are several alternatives). I added this to the ConfigMap to get logfmt logs

  log-format-upstream: >
    remote_addr=$remote_addr
    remote_user=$remote_user
    time_local=$time_local
    method=$request_method
    request=$request_uri
    scheme=$scheme
    status=$status
    body_bytes_sent=$body_bytes_sent
    http_referer=$http_referer
    http_user_agent=$http_user_agent
    request_length=$request_length
    request_time=$request_time
    proxy_upstream_name=$proxy_upstream_name
    proxy_alternative_upstream_name=$proxy_alternative_upstream_name
    upstream_addr=$upstream_addr
    upstream_response_length=$upstream_response_length
    upstream_response_time=$upstream_response_time
    upstream_status=$upstream_status
    req_id=$req_id

{container="nginx-ingress-controller"} |= "body_bytes_sent"