I use Telegraf ping plugin to ping the ip addresses of a few hosts and write the response to InfluxDB “ping” measurement. I currently use the “default” ping method rather than the “native” ping method of the Telegraf ping plugin (see ping plugin documentation) since there was a recent issue with the Native Go Ping method, but this may now have been fixed)
I then simply use a grafana graph to graph the ping table results and (email/slack) alert when the average ping result is above 0 over a 5 minute period - The Telegraf ping result is 0 if online and 1 if not, so the alert only triggers if the node is offline for >5 mins but this is configurable. I also then get an alert when the host comes back online.
And the alert does not fire as this covers a few hosts - when the host drops it does not change the metric (tried by stopping telegraf service as this would replicate down) do you have to set this per host?