Drops in data for Juniper EX queue sampling

Hello All, I am working with Juniper EX switches to stream interface queue performance. I want to be able to see more fidelity on the performance. I am running grafana 10.2.2 and telegraf 1.34.4 with GNMI plugin. One thing I have noticed is every ~5 minutes I will get drops in my graph. The data from Influx 1.8 seems to be there and accurate however Grafana shows these dips. I am using the following query based on 10 second sampling rate configured in Telegraf.

Grafana query.

SELECT
NON_NEGATIVE_DERIVATIVE(LAST(“interface/state/counters/out_queue/tail_drop_pkts”), 10s) AS “tail_drop_pkts_rate”
FROM
“interface-queue”
WHERE
$timeFilter
AND “name” =~ /^$interface$/
GROUP BY
time(10s),
“name”,
“queue_number” FILL(linear)

Influx query:

SELECT “interface/state/counters/out_queue/tail_drop_pkts” FROM “interface-queue” WHERE time > now() - 15m AND “name” = ‘xe-0/2/0’ AND “queue_number” = ‘1’ LIMIT 30
name: interface-queue
time interface/state/counters/out_queue/tail_drop_pkts


1750187299741000000 24838490
1750187309925000000 24841052
1750187320037000000 24844117
1750187330257000000 24846978
1750187340454000000 24849996
1750187350719000000 24852953
1750187361006000000 24855590
1750187371254000000 24858689
1750187381444000000 24861047
1750187391760000000 24864120
1750187401939000000 24866879
1750187412311000000 24870815
1750187422696000000 24873275
1750187432864000000 24876217
1750187442994000000 24878780
1750187453177000000 24882145
1750187463457000000 24884429
1750187473655000000 24887384
1750187483930000000 24890203
1750187494106000000 24893957
1750187504229000000 24897209
1750187514602000000 24900111
1750187524926000000 24903086
1750187535217000000 24905250
1750187545504000000 24908120
1750187555674000000 24911234
1750187565806000000 24913844
1750187575977000000 24917650
1750187586382000000 24921059
1750187596538000000 24923830

Try this…

SELECT
NON_NEGATIVE_DERIVATIVE(MAX("interface/state/counters/out_queue/tail_drop_pkts"), 10s) AS "tail_drop_pkts_rate"
FROM "interface-queue"
WHERE $timeFilter
AND "name" =~ /^$interface$/
GROUP BY time(10s), "name", "queue_number"
FILL(previous)

That seems to make it look worse. I don’t think it is the data. What else should I look at?

So why do you believe something is wrong when you have this data?
Maybe you have something that cleans the queue every 5 minutes, so then you will have, of course, lower packet drop at that time.

Routers and switches don’t behave that way. They constantly are servicing the queue in microseconds. I feel like its a timing problem with data timestamps and grafana showing the drop in traffic.

I guess that’s because imperfection in collection timing. Avg time diff in provided data set is: 10.234 (not exactly 10sec), so every 30th (5min) timestamp is going to another time bucket (30x10.234 = 307.02, not 300) - that will cause different time grouping on the InfluxDB level.
You may try to lower collection freq (e.g. 5 sec, of course if it is not going to kill a device), so you have more datapoints for time grouping.

Show graph for simple count of your datapoint:

SELECT
  COUNT "interface/state/counters/out_queue/tail_drop_pkts")
FROM "interface-queue"
WHERE
  $timeFilter
  AND "name" =~ /^$interface$/
GROUP BY
  time(10s),
  "name",
  "queue_number"

Makes sense and thanks for the follow up.