-
What Grafana version and what operating system are you using?
Grafana v10.0.3 on Kubernetes Cluster
-
What are you trying to achieve?
Build a Dashboard with some basic information about the Cluster like Node count, Pod count…
-
How are you trying to achieve it?
sum(cluster:master_nodes)
-
What happened?
Showed 6 Master Nodes
-
What did you expect to happen?
Showing 3 Master Nodes
-
Did you receive any errors in the Grafana UI or in related logs? If so, please tell us exactly what they were.
I didnt receive any errors.
I looked into the query responses and saw that i get the same data from 2 Prometheus replicas. Is there an option to consolidate the data if its the same but from 2 replicas? I did some googling and found how to do it if you have 2 seperate datasources e.g. Prometheus and InfluxDb. but in my case it counts as 1 Datasource in grafana since its one Prometheus that hast 2 replicas due to HA.
Cheers
Marcel
1 Like
Hi everyone,
I’m also encountering this issue after increasing the Prometheus replicas to 2 for high availability (HA). This is causing the following metrics to display incorrect values, effectively doubling them:
cluster:namespace:pod_cpu:active:kube_pod_container_resource_limits
cluster:namespace:pod_cpu:active:kube_pod_container_resource_requests
kube_node_status_capacity
kube_node_status_allocatable
- Any metric with the
instance
label
Problem:
The main issue is that each metric appears to be logged twice, causing doubled results when summed. For example, metrics like node capacity and allocation display duplicate values for the same nodes.
Example:
The table format shows that there are multiple entries for the same node, each with identical capacity values. See the tables below:
Here’s an example confirming that different instance
values actually correspond to the same node UUID:
Cause:
From my investigation, it seems that each node in the cluster reports the same two instance
values as every other node. These instance
values change over time but remain identical across the cluster at any given time. The metrics are correct when only one instance is present, so this duplication from both Prometheus replicas seems to be at the root of the issue.
Questions:
If anyone has dealt with this before, I would appreciate any guidance on:
- Understanding the origin of the
instance
label in these duplicated metrics
- Identifying why each node is reported with multiple
instance
values
Thanks in advance for any help!
I have the same problem. Have you found any solution for this issue?