Kube-state-metrics and Grafana Agent run as binaries inside a container --> Grafana UI shows no kube-state-metrics metrics

pavelzagalsky · November 17, 2022, 12:30pm

What Grafana version and what operating system are you using?
grafana-agent 0.27.1 on a Ubuntu Container image running on an EKS cluster.
What are you trying to achieve?
I am trying to receive KSM metrics in the Grafana Cloud Dashboards
How are you trying to achieve it?
I have a Ubuntu container that runs a grafana-agent and KSM binaries.
I configured them to collect data from the EKS cluster it runs in as a deployment and send to our Grafana cloud.
What happened?
I see Kubelet and Cadvisor metrics but not the KSM ones
What did you expect to happen?
I expect to see Kubelet/Cadvisor and KSM metrics in the dashboard
Can you copy/paste the configuration(s) that you are having problems with?
This is the grafana-agent configuration:

server:
  log_level: debug
prometheus:
  wal_directory: /tmp/grafana-agent-wal
  global:
    scrape_interval: 60s
    external_labels:
      client_id: asdfsdfsdfsdfsdfsdfuw=
      client_name: pavel-client
      service_name: pavel-service
  configs:
  - name: integrations
    remote_write:
    - url: https://prometheus-prod-10-prod-us-central-0.grafana.net/api/prom/push
      basic_auth:
        username: 234234
        password: password
    scrape_configs:
    - bearer_token_file: /var/run/secrets/kubernetes.io/serviceaccount/token
      job_name: integrations/kubernetes/kube-state-metrics
      scrape_interval: 30s
      kubernetes_sd_configs:
          - role: pod
      metric_relabel_configs:
          - source_labels: [__name__]
            regex: kube_resourcequota|kube_node_info|kube_node_status_condition|kube_node_status_allocatable|kube_node_status_capacity|kube_node_spec_taint|kube_horizontalpodautoscaler_status_condition|kube_horizontalpodautoscaler_spec_min_replicas|kube_horizontalpodautoscaler_spec_max_replicas|kube_horizontalpodautoscaler_spec_target_metric|kube_horizontalpodautoscaler_status_current_replicas|kube_horizontalpodautoscaler_status_desired_replicas|kube_pod_info|kube_pod_owner|kube_pod_status_phase|kube_pod_container_info|kube_pod_container_resource_limits|kube_pod_container_resource_requests|kube_pod_container_status_waiting_reason|kube_pod_container_status_restarts_total|kube_deployment_status_replicas_updated|kube_deployment_spec_replicas|kube_deployment_status_replicas_available|kube_replicaset_owner|kube_replicaset_spec_replicas|kube_replicaset_status_ready_replicas|kube_job_owner|kube_job_status_active|kube_job_failed|kube_statefulset_status_observed_generation|kube_statefulset_status_replicas_updated|kube_statefulset_replicas|kube_statefulset_metadata_generation|kube_statefulset_status_current_revision|kube_statefulset_status_replicas_ready|kube_statefulset_status_replicas|kube_statefulset_status_update_revision|kube_daemonset_status_number_misscheduled|kube_daemonset_status_updated_number_scheduled|kube_daemonset_status_number_available|kube_daemonset_status_desired_number_scheduled|kube_daemonset_status_current_number_scheduled
            action: keep
      relabel_configs:
          - action: keep
            regex: kube-state-metrics
            source_labels:
              - __meta_kubernetes_pod_label_app_kubernetes_io_name
    - bearer_token_file: /var/run/secrets/kubernetes.io/serviceaccount/token
      job_name: integrations/kubernetes/cadvisor
      kubernetes_sd_configs:
          - role: node
      metric_relabel_configs:
          - source_labels: [__name__]
            regex: container_network_receive_bytes_total|container_network_transmit_bytes_total|container_memory_rss|container_memory_working_set_bytes|container_cpu_usage_seconds_total|container_cpu_cfs_periods_total|container_cpu_cfs_throttled_periods_total
            action: keep
      relabel_configs:
          - replacement: kubernetes.default.svc.cluster.local:443
            target_label: __address__
          - regex: (.+)
            replacement: /api/v1/nodes/$1/proxy/metrics/cadvisor
            source_labels:
              - __meta_kubernetes_node_name
            target_label: __metrics_path__
      scheme: https
      tls_config:
          ca_file: /var/run/secrets/kubernetes.io/serviceaccount/ca.crt
          insecure_skip_verify: false
          server_name: kubernetes
    - bearer_token_file: /var/run/secrets/kubernetes.io/serviceaccount/token
      job_name: integrations/kubernetes/kubelet
      kubernetes_sd_configs:
          - role: node
      metric_relabel_configs:
          - source_labels: [__name__]
            regex: kubelet_node_name|kubelet_node_config_error|kubelet_running_pods|kubelet_running_pod_count|kubelet_running_containers|kubelet_running_container_count|kubelet_volume_stats_inodes_used|kubelet_volume_stats_inodes|kubelet_volume_stats_available_bytes|kubelet_volume_stats_capacity_bytes
            action: keep
      relabel_configs:
          - replacement: kubernetes.default.svc.cluster.local:443
            target_label: __address__
          - regex: (.+)
            replacement: /api/v1/nodes/$1/proxy/metrics
            source_labels:
              - __meta_kubernetes_node_name
            target_label: __metrics_path__
      scheme: https
      tls_config:
          ca_file: /var/run/secrets/kubernetes.io/serviceaccount/ca.crt
          insecure_skip_verify: false
          server_name: kubernetes
loki:
  configs:
  - name: default
    positions:
      filename: /tmp/positions.yaml
    scrape_configs:
      - job_name: some-logs
        static_configs:
          - targets: [localhost]
            labels:
              job: some-logs
              __path__: /var/log/somnething.log
              client_id: afdasdfasdasdf
              client_name: pavel-client
              service_name: pavel-service
    clients:
      - url: https://logs-prod-us-central1.grafana.net/api/prom/push
        basic_auth:
          username: 123123
          password: password

Did you receive any errors in the Grafana UI or in related logs? If so, please tell us exactly what they were.

No errors or warnings pertaining the kube-state-metrics.
Running the following command I could see that KSM is scraping the data from the cluster successfully:

curl http://localhost:8080/metrics

Running curl http://0.0.0.0:80/agent/api/v1/metrics/targets | jq | grep job

I get the following response:

"job": "integrations/kubernetes/cadvisor"
"job": "integrations/kubernetes/cadvisor"
"job": "integrations/kubernetes/cadvisor"
"job": "integrations/kubernetes/cadvisor"
"job": "integrations/kubernetes/cadvisor"
"job": "integrations/kubernetes/cadvisor"
"job": "integrations/kubernetes/kubelet"
"job": "integrations/kubernetes/kubelet"
"job": "integrations/kubernetes/kubelet"
"job": "integrations/kubernetes/kubelet"
"job": "integrations/kubernetes/kubelet"
"job": "integrations/kubernetes/kubelet"

Did you follow any online instructions? If so, what is the URL?

I did not use the default Helm installation guidelines but rather broke them to several parts of our deployment

peterolivo · November 17, 2022, 4:22pm

kube-state-metrics watches Kubernetes resources in your cluster and emits Prometheus metrics that can be scraped by Grafana Agent. To learn more, please see the kube-state-metrics docs.

The agent/integration config you posted is for the agent to scrape KSM, Cadvisor, and Kublet, these three need to be deployed in the cluster as well. The missing piece is deploying Kube-State-Metrics to your cluster.

Deploy kube-state-metrics

If you do not wish to use Helm to manage kube-state-metrics, you can use the example manifests in the kube-state-metrics repo.

For a helm install, run the following commands from your shell to install kube-state-metrics into the default namespace of your Kubernetes cluster:

helm repo add prometheus-community https://prometheus-community.github.io/helm-charts && helm repo update && helm install ksm prometheus-community/kube-state-metrics --set image.tag=v2.4.2 -n default

To deploy kube-state-metrics into a non-default Namespace, please change -n default to your desired Namespace.

pavelzagalsky · November 17, 2022, 5:12pm

Hi Petero, thank you for the brief response. I am running the ksm binary in the same container as the grafana agent. I believe I got all the configmaps and namespaces set correctly because I can see the ksm data running here:

curl http://localhost:8080/metrics

cadvisor and kubelet data I see being sent successfully to Grafana UI. ksm does not…
Any way I can debug and understand what went wrong?
I don’t see any errors in the logs

peterolivo · November 17, 2022, 8:07pm

Hey @pavelzagalsky

Happy to help work through the use case and config.
Please send in a note to cloud-success@grafana.com.

Thanks!

Topic		Replies	Views
Grafana agent scrape config Grafana Cloud	5	4426	November 17, 2022
Kubernetes cluster with existing prometheus setup : remote_write vs grafana agent Grafana Cloud kubernetes	1	705	March 10, 2022
Cluster metrics dashboards are not working with AKS clusters Grafana Cloud kubernetes , agent , integration	4	4698	April 27, 2022
Reg: sending k8s application metrics to grafana cloud Grafana Cloud kubernetes	2	670	November 18, 2021
No series on Grafana cloud even though Prometheus shows active metrics Grafana Cloud	1	634	February 12, 2021

Kube-state-metrics and Grafana Agent run as binaries inside a container --> Grafana UI shows no kube-state-metrics metrics

Deploy kube-state-metrics

Related topics