-
What Grafana version and what operating system are you using?
grafana-agent 0.27.1 on a Ubuntu Container image running on an EKS cluster. -
What are you trying to achieve?
I am trying to receive KSM metrics in the Grafana Cloud Dashboards -
How are you trying to achieve it?
I have a Ubuntu container that runs a grafana-agent and KSM binaries.
I configured them to collect data from the EKS cluster it runs in as a deployment and send to our Grafana cloud. -
What happened?
I see Kubelet and Cadvisor metrics but not the KSM ones -
What did you expect to happen?
I expect to see Kubelet/Cadvisor and KSM metrics in the dashboard -
Can you copy/paste the configuration(s) that you are having problems with?
This is the grafana-agent configuration:
server:
log_level: debug
prometheus:
wal_directory: /tmp/grafana-agent-wal
global:
scrape_interval: 60s
external_labels:
client_id: asdfsdfsdfsdfsdfsdfuw=
client_name: pavel-client
service_name: pavel-service
configs:
- name: integrations
remote_write:
- url: https://prometheus-prod-10-prod-us-central-0.grafana.net/api/prom/push
basic_auth:
username: 234234
password: password
scrape_configs:
- bearer_token_file: /var/run/secrets/kubernetes.io/serviceaccount/token
job_name: integrations/kubernetes/kube-state-metrics
scrape_interval: 30s
kubernetes_sd_configs:
- role: pod
metric_relabel_configs:
- source_labels: [__name__]
regex: kube_resourcequota|kube_node_info|kube_node_status_condition|kube_node_status_allocatable|kube_node_status_capacity|kube_node_spec_taint|kube_horizontalpodautoscaler_status_condition|kube_horizontalpodautoscaler_spec_min_replicas|kube_horizontalpodautoscaler_spec_max_replicas|kube_horizontalpodautoscaler_spec_target_metric|kube_horizontalpodautoscaler_status_current_replicas|kube_horizontalpodautoscaler_status_desired_replicas|kube_pod_info|kube_pod_owner|kube_pod_status_phase|kube_pod_container_info|kube_pod_container_resource_limits|kube_pod_container_resource_requests|kube_pod_container_status_waiting_reason|kube_pod_container_status_restarts_total|kube_deployment_status_replicas_updated|kube_deployment_spec_replicas|kube_deployment_status_replicas_available|kube_replicaset_owner|kube_replicaset_spec_replicas|kube_replicaset_status_ready_replicas|kube_job_owner|kube_job_status_active|kube_job_failed|kube_statefulset_status_observed_generation|kube_statefulset_status_replicas_updated|kube_statefulset_replicas|kube_statefulset_metadata_generation|kube_statefulset_status_current_revision|kube_statefulset_status_replicas_ready|kube_statefulset_status_replicas|kube_statefulset_status_update_revision|kube_daemonset_status_number_misscheduled|kube_daemonset_status_updated_number_scheduled|kube_daemonset_status_number_available|kube_daemonset_status_desired_number_scheduled|kube_daemonset_status_current_number_scheduled
action: keep
relabel_configs:
- action: keep
regex: kube-state-metrics
source_labels:
- __meta_kubernetes_pod_label_app_kubernetes_io_name
- bearer_token_file: /var/run/secrets/kubernetes.io/serviceaccount/token
job_name: integrations/kubernetes/cadvisor
kubernetes_sd_configs:
- role: node
metric_relabel_configs:
- source_labels: [__name__]
regex: container_network_receive_bytes_total|container_network_transmit_bytes_total|container_memory_rss|container_memory_working_set_bytes|container_cpu_usage_seconds_total|container_cpu_cfs_periods_total|container_cpu_cfs_throttled_periods_total
action: keep
relabel_configs:
- replacement: kubernetes.default.svc.cluster.local:443
target_label: __address__
- regex: (.+)
replacement: /api/v1/nodes/$1/proxy/metrics/cadvisor
source_labels:
- __meta_kubernetes_node_name
target_label: __metrics_path__
scheme: https
tls_config:
ca_file: /var/run/secrets/kubernetes.io/serviceaccount/ca.crt
insecure_skip_verify: false
server_name: kubernetes
- bearer_token_file: /var/run/secrets/kubernetes.io/serviceaccount/token
job_name: integrations/kubernetes/kubelet
kubernetes_sd_configs:
- role: node
metric_relabel_configs:
- source_labels: [__name__]
regex: kubelet_node_name|kubelet_node_config_error|kubelet_running_pods|kubelet_running_pod_count|kubelet_running_containers|kubelet_running_container_count|kubelet_volume_stats_inodes_used|kubelet_volume_stats_inodes|kubelet_volume_stats_available_bytes|kubelet_volume_stats_capacity_bytes
action: keep
relabel_configs:
- replacement: kubernetes.default.svc.cluster.local:443
target_label: __address__
- regex: (.+)
replacement: /api/v1/nodes/$1/proxy/metrics
source_labels:
- __meta_kubernetes_node_name
target_label: __metrics_path__
scheme: https
tls_config:
ca_file: /var/run/secrets/kubernetes.io/serviceaccount/ca.crt
insecure_skip_verify: false
server_name: kubernetes
loki:
configs:
- name: default
positions:
filename: /tmp/positions.yaml
scrape_configs:
- job_name: some-logs
static_configs:
- targets: [localhost]
labels:
job: some-logs
__path__: /var/log/somnething.log
client_id: afdasdfasdasdf
client_name: pavel-client
service_name: pavel-service
clients:
- url: https://logs-prod-us-central1.grafana.net/api/prom/push
basic_auth:
username: 123123
password: password
- Did you receive any errors in the Grafana UI or in related logs? If so, please tell us exactly what they were.
No errors or warnings pertaining the kube-state-metrics.
Running the following command I could see that KSM is scraping the data from the cluster successfully:
curl http://localhost:8080/metrics
Running curl http://0.0.0.0:80/agent/api/v1/metrics/targets | jq | grep job
I get the following response:
"job": "integrations/kubernetes/cadvisor"
"job": "integrations/kubernetes/cadvisor"
"job": "integrations/kubernetes/cadvisor"
"job": "integrations/kubernetes/cadvisor"
"job": "integrations/kubernetes/cadvisor"
"job": "integrations/kubernetes/cadvisor"
"job": "integrations/kubernetes/kubelet"
"job": "integrations/kubernetes/kubelet"
"job": "integrations/kubernetes/kubelet"
"job": "integrations/kubernetes/kubelet"
"job": "integrations/kubernetes/kubelet"
"job": "integrations/kubernetes/kubelet"
- Did you follow any online instructions? If so, what is the URL?
I did not use the default Helm installation guidelines but rather broke them to several parts of our deployment