Detecting offline hosts - promql with absent

Hey guys,

I’m trying out otel collector and alloy to replace my current prometheus, but they differ because prometheus scraps my hosts in order to collect data, and otel/alloy send data to prometheus (I’m testing with grafana cloud).

The thing is, I currently alert on up == 0, so I know when my hosts are offline (or more precisely, cant be scrapped), but I didn’t figure out how to do that without the metric in an extensible way, for example, right now I’m alerting on this:

absent_over_time(system_uptime_seconds{host_alias=“web-prod-instance”}[1m])

But if I have 20 hosts, I will need to add all hosts names in the query. I tried with a regex, but then I can’t access the host_alias in the alert summary.

Do you guys know a better way to do this?

Thanks in advance.

Hey @jangaraj thanks a lot for pointing that post out. I don’t remember seeing it when I was searching.

I created the alert with the proposed solution and it worked, for future reference on how to detect offline hosts using opentelemetry collector hostmetrics receiver, here is the alert’s query I used (a little different but also worked):

(count(count_over_time(system_uptime_seconds[1h])) by (host_alias) unless count(count_over_time(system_uptime_seconds[1m])) by (host_alias))

Thanks again!