Hey guys,
I’m trying out otel collector and alloy to replace my current prometheus, but they differ because prometheus scraps my hosts in order to collect data, and otel/alloy send data to prometheus (I’m testing with grafana cloud).
The thing is, I currently alert on up == 0, so I know when my hosts are offline (or more precisely, cant be scrapped), but I didn’t figure out how to do that without the metric in an extensible way, for example, right now I’m alerting on this:
absent_over_time(system_uptime_seconds{host_alias=“web-prod-instance”}[1m])
But if I have 20 hosts, I will need to add all hosts names in the query. I tried with a regex, but then I can’t access the host_alias in the alert summary.
Do you guys know a better way to do this?
Thanks in advance.