Troubleshooting kubernetes_sd_configs not finding any pods

I’m having trouble getting Promtail to pick up any Pod logs at all in Kubernetes.

promtailconfig as follows:

  - job_name: my-pods
      - role: pod

Promtail is deployed as DaemonSet with serviceAccount having permission to get+watch+list pods. I have verified these Promtail pods have access, by coping kubectl into same pod and running kubectl get pods inside it, which lists other active pods on same node.

When this daemonset starts Promtail, it produces following log:

level=info ts=2021-11-12T12:55:19.936029852Z caller=kubernetes.go:284 component=discovery discovery=kubernetes msg="Using pod service account via in-cluster config"
level=debug ts=2021-11-12T12:55:19.937090573Z caller=manager.go:195 component=discovery msg="Starting provider" provider=kubernetes/0 subs=[my-pods]
level=info ts=2021-11-12T12:55:19.938091192Z caller=server.go:260 http=[::]:9081 grpc=[::]:44783 msg="server listening on addresses"
level=info ts=2021-11-12T12:55:19.939912528Z caller=main.go:119 msg="Starting Promtail" version="(version=2.4.1, branch=HEAD, revision=f61a4d261)"

If i start other Pods on the Node there is no indication in the promtail-log, no labels or log lines shipped to Loki, and browsing the debug-interface at /targets and /service-discovery shows 0 active targets.

The /metric endpoint shows, among lots of other things:

prometheus_sd_kubernetes_events_total{event="add",role="pod"} 0
prometheus_sd_kubernetes_http_request_duration_seconds_sum{endpoint="/api/v1/pods"} 0.032409927
prometheus_sd_kubernetes_http_request_duration_seconds_count{endpoint="/api/v1/pods"} 1
# HELP prometheus_sd_kubernetes_http_request_total Total number of HTTP requests to the Kubernetes API by status code.
# TYPE prometheus_sd_kubernetes_http_request_total counter
prometheus_sd_kubernetes_http_request_total{status_code="200"} 4

How do i proceed troubleshooting this?

Hello @hterik ,

I was going through the same exercise a few weeks ago :slight_smile:

I’m running grafana-agent on Kubernetes and as far as I know the log collection config is identical to Promtail config.

This is what my config looks like

      - name: kubernetes_pods
          filename: /tmp/positions_pods.yaml
          - job_name: kubernetes_pods
              - role: pod
              - docker: {}
              - source_labels:
                - __meta_kubernetes_pod_controller_name
                target_label: __service__
              - source_labels:
                - __meta_kubernetes_pod_node_name
                target_label: __host__
              - action: labelmap
                regex: __meta_kubernetes_pod_label_(.+)
              - action: replace
                replacement: $1
                - name
                target_label: job
              - action: replace
                - __meta_kubernetes_namespace
                target_label: namespace
              - action: replace
                - __meta_kubernetes_pod_name
                target_label: pod
              - action: replace
                - __meta_kubernetes_pod_container_name
                target_label: container
              - replacement: /var/log/pods/*$1/*.log
                separator: /
                - __meta_kubernetes_pod_uid
                - __meta_kubernetes_pod_container_name
                target_label: __path__

If you use network policies and namespaces, make sure Promtail is allowed to talk to Kubernetes API, that also means that RBAC needs to allow Promtail that access.

Hope that helps. I’ll see if I find some other notes from when I was troubleshooting my issues.

Figured it out now, the Promtail Pod must have the HOSTNAME environment variable, via spec.nodeName downwards api. With appropriate RBAC to use this api.
It must also mount the following hostPaths: /var/lib/docker/containers, /var/log/pods.
On top of this relabel_configs similar to the one from b0b above need to be part of the scrape_config.

So basically the kubernetes_sd_configs only creates labels (under the condition HOSTNAME is present), then you have to configure the rest of mapping those labels to the log files on disk.

The documentation around this area isn’t so clear, best resource is to see how it’s done in helm-charts/charts/promtail at main · grafana/helm-charts · GitHub

Also don’t forget to mount positions file in a volume with same lifetime as the node.