Issue with device "flapping"

Wondering if someone can help with the following issue or propose a workaround:

I have a number of MQTT devices and I track if they are online or not (0=offline 1=online)

However, “flapping” occurs once in a while, IE : it rapidly swap offline/online within the same second, therefore both 0 and 1 appear under the same second even if technically 0/offline came before 1/online.

This can be seen correctly under a “table” viz.

However, when trying to track in a visual way (with time series or “discrete” for instance)
it seems to only retain 0 (which I assume is the “first” value under that second).
In reality, we should see 1/online everywhere, except for that split second 0/offline.
But instead, it just “sticks” at 0/offline.

Any workaround for this? open to any ideas so I can correctly reflect reality in the graphs.

Thanks!

I think your query is a bit funky. The fact that you have GROUP BY time(1s) means that you’ll have values aggregated to 1-second resolution, but the presence of distinct() will (based on what you’re showing) expose multiple values per given timestamp. As you’ve seen, the visualizations can’t properly show multiple values at the same timestamp. I think you can do one of two things:

A) Don’t do aggregation/grouping over time. I.e. remove everything from the GROUP BY line, as well as the distinct() clause. That should return your raw data, i.e. all state changes, with the real timestamps at which each occurred (assuming these are recorded with sub-second resolution). The only issue is that you may need to play around with the visualizations to get what you need. I don’t remember whether you can tell the discrete panel to show the “last” value during periods where there is no data, but that’s basically what I guess you’ll need.

B) Do aggregation over a shorter time period. E.g. 0.1s or even shorter - whatever resolution will be small enough to show distinct state changes at different times. I would also switch the distinct() aggregation to last() or similar, so you don’t end up with multiple values at a single timestamp.

Hope that helps!

1 Like

Thanks for the prompt reply and proposed solutions;

These changes generate the same result;
Does that mean that the underlying data doesn’t have sub second resolution and/or how How can I validate raw resolution?

When 2 results are within the same 1s interval, how is the actual order processed? is there some logic to it or just arbitrary?
Since the order in which the results are reported reflect reality (mostly online except a fraction second “flapping” - 0/offline then back 1/online) and is reproducible (multiple instances, always same order) could I force Grafana to process data as first in = oldest if they are within the same second without further resolution available?

2021-04-01 07:58:57 0
2021-04-01 07:58:57 1

I am new to Grafana so can’t figure out how to get resolution under 1sec;
The drop down menu stops a 1s and manually entering “0.1” “0.1s” “0,1s” “0,1” returns “no data”. I assume this means raw data resolution is 1s ? So back to the same issue ?

EDIT : option A did work; had to “refresh” display…

Still, I’d be interested to understand how to "Do aggregation over a shorter time period. E.g. 0.1s or even shorter " ; would you mind confirming the correct field to use and input format?
As stated above I tried using “group by” “time” but the drop down stops at 1sec and not sure how to input a custom interval; or am I simply not using the correct approach?

Great to hear it!

It’s possible that fractional duration literals aren’t actually supported, in which case you need to for example express it in milliseconds (e.g. 100ms). If Grafana doesn’t let you enter that using the UI editor, you can switch into raw query editing mode.

You were right again; had to use xxxxms (100ms for instance) vs 0,1s

Now have 2 options to proceed;
thanks for your support,
have a great weekend,