So let’s say we use InfluxDB with Telegraf to get metrics from hosts and we want to alert based on CPU utilization being high:
from(bucket: "metrics") |> range(start: v.timeRangeStart, stop: v.timeRangeStop) |> filter(fn: (r) => r["_measurement"] == "cpu") |> filter(fn: (r) => r["_field"] == "usage_idle") |> filter(fn: (r) => r["cpu"] == "cpu-total") |> group(columns: ["host"]) |> drop(columns: ["_value", "_field", "_measurement", "_start", "_stop", "cpu", "_time"]) |> group()
… and do some math to calculate load and when to trigger.
This works fine when all hosts are reporting their data. But if one stops reporting this rule doesn’t care as it is still receiving data from other hosts. For this alert to start sending a No Data warning, all hosts need to stop sending data.
Is there a way to allow a multi dimensional rule to recognize which ‘hosts’ should be reporting?
I can only think of one way which is making separate rules with a filter on host name for each host, but that defeats the purpose of a multi dimensional rule.