Promtail Configuration issues: dry-run validates config but Promtail "live" did not

Hi,

I’ve used the --dry-run switch to test my scrape job for serilog compact logs, and it works as expected calling on my term (cat sample.log | promtail --config.file promtail.yaml --stdin --dry-run --inspect --client.url http://127.0.0.1:3100/loki/api/v1/pus), but when I apply this configuration to the promtail deployment it did not accept the configuration.

The kubernetes cluster where the promtail is deployed is configured with the switch -print-config-stderr, bellow is the output from one pod, it finishes with an error and stay on crash loopback.

# Loki Config
# (version=, branch=, revision=)
server:
  http_listen_address: ""
  http_listen_port: 3101
  http_listen_conn_limit: 0
  grpc_listen_address: ""
  grpc_listen_port: 9095
  grpc_listen_conn_limit: 0
  http_tls_config:
    cert_file: ""
    key_file: ""
    client_auth_type: ""
    client_ca_file: ""
  grpc_tls_config:
    cert_file: ""
    key_file: ""
    client_auth_type: ""
    client_ca_file: ""
  register_instrumentation: true
  graceful_shutdown_timeout: 30s
  http_server_read_timeout: 30s
  http_server_write_timeout: 30s
  http_server_idle_timeout: 2m0s
  grpc_server_max_recv_msg_size: 4194304
  grpc_server_max_send_msg_size: 4194304
  grpc_server_max_concurrent_streams: 100
  grpc_server_max_connection_idle: 2562047h47m16.854775807s
  grpc_server_max_connection_age: 2562047h47m16.854775807s
  grpc_server_max_connection_age_grace: 2562047h47m16.854775807s
  grpc_server_keepalive_time: 2h0m0s
  grpc_server_keepalive_timeout: 20s
  grpc_server_min_time_between_pings: 5m0s
  grpc_server_ping_without_stream_allowed: false
  log_format: logfmt
  log_level: error
  log_source_ips_enabled: false
  log_source_ips_header: ""
  log_source_ips_regex: ""
  http_path_prefix: ""
  external_url: ""
  health_check_target: null
  disable: false
client:
  url: http://172.16.3.22:31000/loki/api/v1/push
  batchwait: 1s
  batchsize: 1048576
  follow_redirects: false
  backoff_config:
    min_period: 500ms
    max_period: 5m0s
    max_retries: 10
  timeout: 10s
  tenant_id: ""
positions:
  sync_period: 10s
  filename: /run/promtail/positions.yaml
  ignore_invalid_yaml: false
scrape_configs:
- job_name: serilog
  pipeline_stages:
  - cri: {}
  - json:
      expressions:
        exceptionType: ExceptionDetail.InnerException.Source
        level: '"@l"'
        requestPath: RequestPath
        time: '"@t"'
  - labels:
      exceptionType: null
      level: null
      requestPath: null
  - timestamp:
      format: RFC3339Nano
      source: time
  - match:
      selector: '{exceptionType=~"Gol.Sabre.+"}'
      stages:
      - json:
          expressions:
            sabreErrorCode: ExceptionDetail.InnerException.ErrorCode
            sabreResponseCode: ExceptionDetail.InnerException.ResponseCode
      - labels:
          sabreErrorCode: null
          sabreResponseCode: null
  static_configs: []
- job_name: kubernetes-pods
  pipeline_stages:
  - cri: {}
  relabel_configs:
  - source_labels: [__meta_kubernetes_pod_controller_name]
    separator: ;
    regex: ([0-9a-z-.]+?)(-[0-9a-f]{8,10})?
    target_label: __tmp_controller_name
    replacement: $1
    action: replace
  - source_labels: [__meta_kubernetes_pod_label_app_kubernetes_io_name, __meta_kubernetes_pod_label_app,
      __tmp_controller_name, __meta_kubernetes_pod_name]
    separator: ;
    regex: ^;*([^;]+)(;.*)?$
    target_label: app
    replacement: $1
    action: replace
  - source_labels: [__meta_kubernetes_pod_label_app_kubernetes_io_component, __meta_kubernetes_pod_label_component]
    separator: ;
    regex: ^;*([^;]+)(;.*)?$
    target_label: component
    replacement: $1
    action: replace
  - source_labels: [__meta_kubernetes_pod_node_name]
    separator: ;
    regex: (.*)
    target_label: node_name
    replacement: $1
    action: replace
  - source_labels: [__meta_kubernetes_namespace]
    separator: ;
    regex: (.*)
    target_label: namespace
    replacement: $1
    action: replace
  - source_labels: [namespace, app]
    separator: /
    regex: (.*)
    target_label: job
    replacement: $1
    action: replace
  - source_labels: [__meta_kubernetes_pod_name]
    separator: ;
    regex: (.*)
    target_label: pod
    replacement: $1
    action: replace
  - source_labels: [__meta_kubernetes_pod_container_name]
    separator: ;
    regex: (.*)
    target_label: container
    replacement: $1
    action: replace
  - source_labels: [__meta_kubernetes_pod_uid, __meta_kubernetes_pod_container_name]
    separator: /
    regex: (.*)
    target_label: __path__
    replacement: /var/log/pods/*$1/*.log
    action: replace
  - source_labels: [__meta_kubernetes_pod_annotationpresent_kubernetes_io_config_hash,
      __meta_kubernetes_pod_annotation_kubernetes_io_config_hash, __meta_kubernetes_pod_container_name]
    separator: /
    regex: true/(.*)
    target_label: __path__
    replacement: /var/log/pods/*$1/*.log
    action: replace
  static_configs: []
  kubernetes_sd_configs:
  - role: pod
    kubeconfig_file: ""
    follow_redirects: true
target_config:
  sync_period: 10s
  stdin: false


level=error ts=2021-09-24T13:27:54.240306089Z caller=main.go:115 msg="error creating promtail" error="unknown scrape config"

I know the problem is with the job serilog, because if it’s removed from the configuration the pods starts properly.

It’s my first time writing a promtail scrape configuration, I’m pretty sure it’s my fault, but since the dry run works as I expected, now I’m completely lost.

Please if somebody could help, I really appreciate.

- job_name: serilog
  pipeline_stages:
  - cri: {}
  - json:
      expressions:
        exceptionType: ExceptionDetail.InnerException.Source
        level: '"@l"'
        requestPath: RequestPath
        time: '"@t"'
  - labels:
      exceptionType: null
      level: null
      requestPath: null
  - timestamp:
      format: RFC3339Nano
      source: time
  - match:
      selector: '{exceptionType=~"Gol.Sabre.+"}'
      stages:
      - json:
          expressions:
            sabreErrorCode: ExceptionDetail.InnerException.ErrorCode
            sabreResponseCode: ExceptionDetail.InnerException.ResponseCode
      - labels:
          sabreErrorCode: null
          sabreResponseCode: null

I figured out what is missing after take a look at promtail source code (manager.go), I found the error message logged by promtail, and it was related to the lack of a discovery configuration.

I think the error message could be improved, from unknown scrape config to something like, this job has no discovery configuration.

	for _, cfg := range scrapeConfigs {
		switch {
		case cfg.HasServiceDiscoveryConfig():
			targetScrapeConfigs[FileScrapeConfigs] = append(targetScrapeConfigs[FileScrapeConfigs], cfg)
		case cfg.JournalConfig != nil:
			targetScrapeConfigs[JournalScrapeConfigs] = append(targetScrapeConfigs[JournalScrapeConfigs], cfg)
		case cfg.SyslogConfig != nil:
			targetScrapeConfigs[SyslogScrapeConfigs] = append(targetScrapeConfigs[SyslogScrapeConfigs], cfg)
		case cfg.GcplogConfig != nil:
			targetScrapeConfigs[GcplogScrapeConfigs] = append(targetScrapeConfigs[GcplogScrapeConfigs], cfg)
		case cfg.PushConfig != nil:
			targetScrapeConfigs[PushScrapeConfigs] = append(targetScrapeConfigs[PushScrapeConfigs], cfg)
		case cfg.WindowsConfig != nil:
			targetScrapeConfigs[WindowsEventsConfigs] = append(targetScrapeConfigs[WindowsEventsConfigs], cfg)

		default:
			return nil, errors.New("unknown scrape config")
		}
	}

The solution was to add below lines to the job;

        kubernetes_sd_configs:
        - role: pod