Node down alert - Grafana

kumaresan078 · October 31, 2019, 1:58pm

Hi,

Is it possible to monitor node status with grafana? Say for example if the node down grafana should trigger alert.

I am looking for ICMP Ping when the nodes not pinging grafana should trigger alert.

We are using combination telegraf+influxdb+grafana stack.

Regards
Kumar

n1nj4888 · November 1, 2019, 9:50am

I use Telegraf ping plugin to ping the ip addresses of a few hosts and write the response to InfluxDB “ping” measurement. I currently use the “default” ping method rather than the “native” ping method of the Telegraf ping plugin (see ping plugin documentation) since there was a recent issue with the Native Go Ping method, but this may now have been fixed)

I then simply use a grafana graph to graph the ping table results and (email/slack) alert when the average ping result is above 0 over a 5 minute period - The Telegraf ping result is 0 if online and 1 if not, so the alert only triggers if the node is offline for >5 mins but this is configurable. I also then get an alert when the host comes back online.

kumaresan078 · November 5, 2019, 2:17pm

Hi,

Thanks for the update.

I tried it works well however grafana sends mail after 5 to 8 mins when the node offline.

This becomes problem especially for production environment if the node goes down should get an mail within 1 or 2mins.

Regards
Kumar

kumaresan078 · November 18, 2019, 4:58am

Hi,

I modified the condition for value to 0 which triggers an alert asap when node goes down.

Regards
Kumar

stuartpelton · January 28, 2021, 3:40pm

Hello all I know its been a while on this one,

I am looking to do a similar thing, where in the Telegraf config do you apply the IP address to ping?
Is it within the

[[inputs.ping]]

Hosts to send ping packets to.

urls = [“example.org”]

part?

Sorry a little confused

stuartpelton · February 3, 2021, 2:40pm

Has anyone else used ping?
any guidance?

jl678 · February 12, 2021, 9:10pm

I just did this yesterday. You simply put in the host name or ID with quotes separated by commas. For example:

urls = [“google.com”,“10.0.0.1”]

stuartpelton · February 17, 2021, 10:08am

Thanks for the info much appreciated, have you setup the alerting within Grafana for this?

I have set a query for the panel

from(bucket: “telegraf”)
|> range(start: v.timeRangeStart, stop: v.timeRangeStop)
|> filter(fn: ® => r["_measurement"] == “ping”)
|> filter(fn: ® => r["_field"] == “reply_received”)
|> aggregateWindow(every: v.windowPeriod, fn: mean, createEmpty: false)
|> yield(name: “mean”)

And the alert does not fire as this covers a few hosts - when the host drops it does not change the metric (tried by stopping telegraf service as this would replicate down) do you have to set this per host?

Topic		Replies	Views
Setup alerting for telegraf ping plugin alerting	0	1137	June 22, 2018
Is it possible to send alerts from Grafana to Nagios/Icinga	2	3733	January 13, 2021
Required alert for pinging timeout Configuration alerting	1	505	September 14, 2022
Detect hosts that stop sending metrics Configuration	3	7064	June 28, 2019
Unable to Send Alerts in Grafana 4.1.1 Grafana	38	4607	August 24, 2017

Node down alert - Grafana

Hosts to send ping packets to.

Related topics