-
What Grafana version and what operating system are you using?
v8.2.2 (6232fe07c0), self-hosted on Debian GNU/Linux 11.2 -
What are you trying to achieve?
On our cluster, users can select what resources their jobs need, most prominently the number of CPU cores and RAM. We would like to create a panel to visualize what resources are currently available (and as a bonus, how that changed over time and/or how it looked at some time stamp in the past).
Currently, we feed two metrics per host into Prometheus compute_cpus
and compute_memory_bytes
which are simple gauge metrics with only two labels host
which is the hostname and q
which can be total
or available
. But as we inject those via a Python script into Prometheus, we have full control over these and could also create aggregation before data is entered into Prometheus.
Ideally, q="available"
resources could then be displayed as a two dimensional histogram with color coded “height”, using logarithmic bins to keep the ranges under control (currently cpu metric can be 0<=value<=128 and memory metric 0<=value<=512GByte)
- How are you trying to achieve it?
After experimenting with the in-built histogram feature, trying the heatmap visualization and currently checking for other visualizations, we are mostly at a loss here how to tackle the problem.
The only way, we could get it to “somehow work” was to define one panel with the stats visualization, keeping the memory range in the panel constant and use multiple queries like
count(compute_cpus{q="available"}>=2 and compute_cpus{q="available"}<4 and compute_memory_bytes{q="available"}>=500000000 and compute_memory_bytes{q="available"}<1000000000)
for each cell and just vary the CPU parts. For different memory ranges, we would then try to use row repetition. However, hard coding all this looks pretty cumbersome and error-prone.
Is there a better way?
Cheers
Carsten
PS: Sorry for such an open question with not really matching tags, but I was completely unsure where to place it.