Alert on Prometheus metric using count but still retain labelset

  • Grafana version: v12.2.0

  • Operating System: Linux

I have a metric that indirectly tells me how many servers are running in each datacenter. It is indirect because the metric produces an unrelated value, but the number of values indicates the number of servers that are reporting. Each datacenter has a different number of servers. I am trying to create an alert that tells me when any of the datacenters drops below the specified number of servers. If I use the count function, I am able to achieve the desired result.

count(http_requests{datacenter="dc-001"}) < 5 or
count(http_requests{datacenter="dc-002"}) < 5 or
count(http_requests{datacenter="dc-003"}) < 5 or
count(http_requests{datacenter="dc-004"}) < 3

However, I also still need the full labelset so that I can use those labels in the alert. But using the `count` method removes the labels.

If I use the by(label) option, I can get some of the labels back, but then because each metric produces a different labelset for each cluster, it warns me that it is dropping some of the unions.

How can I determine which datacenter drops below the specified number of servers while still retaining labels to be included in an alert?

In Grafana alerting you can omit filters and do

count by(datacenter, label, label2 ..) (http_requests{datacenter=~"dc-001|dc-002"})

and then use another query for thresholds, for example Test Datasource’s CSV

datacenter, value
dc-001, 5
dc-002, 1

and then use math expression

$A > $B

the math expression will join two queries by the set of labels so all results that have label datacenter=dc-001 will have threshold 5, and dc-002 - 1. Note: that other datacenters that do not match CSV labels will be excluded from the output of math expression.