Server Availability Alert

I’m playing with Grafana Cloud (free). I’ve deployed the Alloy agent to three different servers. I’m kinda shocked there isn’t any built in alerting for when a server goes offline.

Looks like I can maybe use the up metric, but I’m struggling with the information I’ve found. A lot seems to be outdated (for example when writing a rule). My systems aren’t accessible from the Internet to use ping. I’d rather not use a http check.

Anyone have any suggestions on create a simple alert for an offline server?

-Keith

How do you know that server is offline first?

Just have alloy agent running. Hoping to use data from that, but im open to suggestions.

It appears the up metric has a value of 1 when server is reporting and no value when it stops.

Correct. But you it won’t be a correct assumption that server is offline, when you don’t have metric. Maybe just alloy is down.
So you shouldn’t be shocked, because “alert when server is offline” is very naive requirement when you will understand it correctly.

You must simplify it, e.g. assume that server is offline when it doesn’t respond on ping (but that may not 100% true, maybe only network admin blocked a ping)
OR
standard solution is that you run verification of the service, which that server provides - that provides main server value. (for example web server: it wouldn’t be high priority alert for me if web server doesn’t respond for ping, have high cpu,memory usage if it still serves web pages with acceptable response time)

The up metric is provided by Prometheus to indicate whether a scrape was successful (1 if successful, 0 if not). Ensure your Prometheus setup is correctly collecting this metric from the Alloy agents deployed on your servers.

To create a simple alert for detecting when a server goes offline using Grafana Cloud and the Alloy agent, you can use the up metric from Prometheus. This metric indicates whether a target is up (1) or down (0).

That’s not correct. There can be networking problem between monitoring (prometheus, alloy,…) and monitored server => up will be 0 also when server itself is up and running.