How to display available resources (2d like histogram)?

  • What Grafana version and what operating system are you using?
    v8.2.2 (6232fe07c0), self-hosted on Debian GNU/Linux 11.2

  • What are you trying to achieve?
    On our cluster, users can select what resources their jobs need, most prominently the number of CPU cores and RAM. We would like to create a panel to visualize what resources are currently available (and as a bonus, how that changed over time and/or how it looked at some time stamp in the past).

Currently, we feed two metrics per host into Prometheus compute_cpus and compute_memory_bytes which are simple gauge metrics with only two labels host which is the hostname and q which can be total or available. But as we inject those via a Python script into Prometheus, we have full control over these and could also create aggregation before data is entered into Prometheus.

Ideally, q="available" resources could then be displayed as a two dimensional histogram with color coded “height”, using logarithmic bins to keep the ranges under control (currently cpu metric can be 0<=value<=128 and memory metric 0<=value<=512GByte)

  • How are you trying to achieve it?

After experimenting with the in-built histogram feature, trying the heatmap visualization and currently checking for other visualizations, we are mostly at a loss here how to tackle the problem.

The only way, we could get it to “somehow work” was to define one panel with the stats visualization, keeping the memory range in the panel constant and use multiple queries like

count(compute_cpus{q="available"}>=2 and compute_cpus{q="available"}<4 and compute_memory_bytes{q="available"}>=500000000 and compute_memory_bytes{q="available"}<1000000000)

for each cell and just vary the CPU parts. For different memory ranges, we would then try to use row repetition. However, hard coding all this looks pretty cumbersome and error-prone.

Is there a better way?

Cheers

Carsten

PS: Sorry for such an open question with not really matching tags, but I was completely unsure where to place it.

1 Like

Why not use a table panel with color formatting for cells?

1 Like

Because I mostly failed to get data formatted into a table.

I’ve just started again but not getting anywhere really. I get both metrics via two queries (Instant/Table format), outer join on host and then use Organize fields transform to reduce it just to the values I would like to continue working with (available CPU cores and memory), i.e. at this point I have 2 columns and 100s of rows which I now need to 2d-bin.

And this is my current Friday afternoon mental break point, as I don’t see how I could cast this set with a lot of rows into a table like

| memory / cpus | ==1 | 1<X<=2 | 2<X<=4 | 4<X<=8| …
| <1GByte | 15 | 10 | 8 | 9 | …
| 1GByte<= 2 GByte | …
|…|

Maybe I should really aggregate this already before entering the data in the fist place, but maybe I’m just too tired to think straight right now.

1 Like

How about something like this:

  1. In your reporting/aggregation, compute the bin values (so for example, instead of logging RAM=8400300500, you would log RAM=10000000000)
  2. Add a variable to your dashboard that is populated from a query that returns all of the different values (bin values) of RAM. Let’s call it “RAM_BIN”. Make sure “multi select” is enabled, and select all the values by default.
  3. Have a single query that looks something like:
    COUNT(RAM=${RAM_BIN}) by (CPUS)
  4. From that query, construct a bar graph with RAM_BIN as the “Repeat by variable”. Include ${RAM_BIN} in the panel title

This should get you multiple bar graphs, one for each value of RAM_BIN, where each bar represents the number of CPUs, and the height is the number of hosts that had that number of CPUs (for the specific RAM_BIN value)

It’s still not the 2d table/heatmap that you want, but it does allow you to display all of the data in a generic way (no hardcoding or manual repetition)

1 Like