Coloring Singlestat based on average and current value

mattbuerkle · July 22, 2019, 6:22am

Hi all,

first i need to say that i know grafana now for about 1 week. During this week i worked every day with it to get familiar with its functions. To summarize it: i love it!

I’m running grafana together with Telegraf and InfluxDB.
In Telegraf i have configured the ping plugin to ping a machine in the network every 30 seconds.
As result i have the values returned from the pings in the database.

Some background:

the machine i’m pinging is switch on and off manually
after the machine is switched on and is replying to pings, it still needs some time to be ready to work with
after replying to pings the machine after switch on the first time the machine takes about 3 to 5 minutes to be ready
so I defined, that the machine is ready after 10 minutes

This gives the following list of states:

No ping replay: machine off/not ready (not ready)
Replying to ping, but since less than 10 minutes: machine on, but not ready (starting)
Replying to ping and since at least 10 minutes: machine on and ready (ready)

I know that such procedure depends absolutely on the processes of the machine itself.
But in this case we accept this.

To offer a really easy to understand visualisation i’m thinking about the following:

using a SingleStat panel for showing the states “not ready”, “starting”, “ready”
in addition to the written state i would like to color the background accordingly (red, orange, green)

My idea is to use the result code of the ping (0 = success, 1 = no replay, 2 = other errors). Calculating the average of the result code over 10 minutes gives the following too me:

value = 0: macchine is ready (replies to ping since at least 10 minutes)
0 < value < 1: machine on, but not ready
1 <= value: machine is not ready

My current query:
SELECT mean(“result_code”) FROM “telegraf”.“autogen”.“ping” WHERE (“url” = ‘1.2.3.4’) GROUP BY time(10m) fill(null)

The only thing that is not like i need it is the average value between 0 and 1.
To differe if the machine was ‘off and is not starting’ or if the machine was ‘on and is now off’ i would need to have look an the derivation.

But here i’m on the point i don’t know how to do that.
I hope everybody understands what i have written.

Thank you very much.
Best regards
matt

jbrouwers · July 22, 2019, 9:03am

I’ve experienced a (kind of) similar case to count a failure of a machine and I’ve used the lag function of SQL.
However, I do not know whether influxDB also has a function like that.

In my case I’ve got the status per minute of a machine and I will only count a failure whenever the machine goes from Up to Down.

A simplified version of my query:

    WITH Failure_Time(Timestamp, Failure) as (
    Select 
    Timestamp,
    (CASE 
    WHEN (LAG(MachineStatus, 1) OVER (ORDER BY Machine, TIMESTAMP) = "UP") AND MachineStatus = "Down" THEN 1
    ELSE 0 END) AS Failure
     )

With this part I only take a look at 1 row above since it is ordered by Machine and Timestamp. The temporary table returns a 1 whenever the machine Failed.

Maybe this could help you around. Try to play a bit with it.