Coloring Singlestat based on average and current value

Hi all,

first i need to say that i know grafana now for about 1 week. During this week i worked every day with it to get familiar with its functions. To summarize it: i love it!

I’m running grafana together with Telegraf and InfluxDB.
In Telegraf i have configured the ping plugin to ping a machine in the network every 30 seconds.
As result i have the values returned from the pings in the database.

Some background:

  • the machine i’m pinging is switch on and off manually
  • after the machine is switched on and is replying to pings, it still needs some time to be ready to work with
  • after replying to pings the machine after switch on the first time the machine takes about 3 to 5 minutes to be ready
  • so I defined, that the machine is ready after 10 minutes

This gives the following list of states:

  • No ping replay: machine off/not ready (not ready)
  • Replying to ping, but since less than 10 minutes: machine on, but not ready (starting)
  • Replying to ping and since at least 10 minutes: machine on and ready (ready)

I know that such procedure depends absolutely on the processes of the machine itself.
But in this case we accept this.

To offer a really easy to understand visualisation i’m thinking about the following:

  • using a SingleStat panel for showing the states “not ready”, “starting”, “ready”
  • in addition to the written state i would like to color the background accordingly (red, orange, green)

My idea is to use the result code of the ping (0 = success, 1 = no replay, 2 = other errors). Calculating the average of the result code over 10 minutes gives the following too me:

  • value = 0: macchine is ready (replies to ping since at least 10 minutes)
  • 0 < value < 1: machine on, but not ready
  • 1 <= value: machine is not ready

My current query:
SELECT mean(“result_code”) FROM “telegraf”.“autogen”.“ping” WHERE (“url” = ‘1.2.3.4’) GROUP BY time(10m) fill(null)

The only thing that is not like i need it is the average value between 0 and 1.
To differe if the machine was ‘off and is not starting’ or if the machine was ‘on and is now off’ i would need to have look an the derivation.

But here i’m on the point i don’t know how to do that.
I hope everybody understands what i have written.

Thank you very much.
Best regards
matt

I’ve experienced a (kind of) similar case to count a failure of a machine and I’ve used the lag function of SQL.
However, I do not know whether influxDB also has a function like that.

In my case I’ve got the status per minute of a machine and I will only count a failure whenever the machine goes from Up to Down.

A simplified version of my query:

    WITH Failure_Time(Timestamp, Failure) as (
    Select 
    Timestamp,
    (CASE 
    WHEN (LAG(MachineStatus, 1) OVER (ORDER BY Machine, TIMESTAMP) = "UP") AND MachineStatus = "Down" THEN 1
    ELSE 0 END) AS Failure
     )

With this part I only take a look at 1 row above since it is ordered by Machine and Timestamp. The temporary table returns a 1 whenever the machine Failed.

Maybe this could help you around. Try to play a bit with it.