Dear community,
I currently have the problem that labels for e.g. per pod CPU usage metrics in Prometheus seem to get lost. The metric itsself is there, but additonal labels seem to be lost.
My river.config for the components looks as follows. I am only allowing one metric to pass through at the moment, so that I can gradually add additional metrics
prometheus.exporter.cadvisor "default" {
store_container_labels = true
enabled_metrics = ["cpu", "cpuLoad", "percpu", "memory", "memory_numa", "referenced_memory", "cpu_topology", "cpuset"]
storage_duration = "3m"
}
prometheus.scrape "cadvisor" {
targets = prometheus.exporter.cadvisor.default.targets
forward_to = [prometheus.relabel.add_cluster_label.receiver]
}
prometheus.relabel "add_cluster_label" {
forward_to = [prometheus.remote_write.grafana_cloud.receiver]
rule {
action = "keep"
source_labels = ["__name__"]
regex = "container_cpu_usage_seconds_total"
}
rule {
replacement = "test"
target_label = "cluster"
}
}
prometheus.remote_write "grafana_cloud" {
endpoint {
url = "https://prometheus-prod-01-eu-west-0.grafana.net/api/prom/push"
basic_auth {
username = secret
password = secret
}
}
}
However, when checking the UI for grafana agent, the arguments look a little different:
And the result in grafana looks like this:
From other resources I saw that the cadvisor metric container_cpu_usage_seconds_total can have additonal labels for containers or pods. But those seem to be missing and I am only getting information per worker node of the cluster. For grafana agent I thought that the config store_container_labels would add those labels.
Am I missing something, or is my interpretation of the metric wrong? Is there an alternative metric I could use?