Summed metrics doesn't show real data

I’ve some problems with summing metrics.
Stacking is working just fine, either used by stacking options, or series override options. It can show real data, but… it does draw more than one line, and I need a graph with one line showing summed value from 2 columns.

####1st graph query:

SELECT non_negative_derivative(mean(“value”), 1s) 8 FROM “ifHCInOctets” WHERE “column” =~ /xe-2.(670|680)/ AND timeFilter GROUP BY time(__interval) fill(null)

####2nd graph queries:

SELECT non_negative_derivative(mean(“value”), 1s) *8 FROM “ifHCInOctets” WHERE “column” = ‘xe-2/0/1.670’ AND timeFilter GROUP BY time(__interval) fill(null)
SELECT non_negative_derivative(mean(“value”), 1s) *8 FROM “ifHCInOctets” WHERE “column” = ‘xe-2/0/1.680’ AND timeFilter GROUP BY time(__interval) fill(null)

On 2nd graph columns are stacked, and it shows real data.
On 1st graph without stacking, there is one line (desired), but it shows some values like 250Tbps, which is nowhere near these real values.

Why doesn’t the first graph show summed value?
Tried on 2 different grafana versions (3.0.0-beta51460725904 and v4.2.0-pre1).
Tried to change mean() to sum().
Tried few methods of stacking, but it doesn’t work as I need it to.
How can I fix that?

Don’t think the InfluxDB query language supports this use case sadly.

The problem with the first query is that it takes the mean of counters from different interfaces which is obviously not what you want.

Changing the query

SELECT non_negative_derivative(mean(“value”), 1s) 8 FROM “ifHCInOctets” WHERE “column” =~ /xe-2.(670|680)/ AND $timeFilter GROUP BY time($__interval) fill(null)

to

SELECT non_negative_derivative(sum(“value”), 1s) / 8 FROM “ifHCInOctets” WHERE “column” =~ /xe-2.(670|680)/ AND $timeFilter GROUP BY time($__interval) fill(null)

should work in your case if you poll and store data at some regular interval deltaT and $__interval in GROUP BY time is less or equal to this polling period. In this case influxdb will sum all ifHCInOctet counters in one time slot first and then calculate derivative. It will also work if $__interval is multiple of deltaT steps. This is a crude and fragile approach though unless you replace auto-assigned $__interval in query with a fixed value = deltaT.

I’ve already tried to change mean() to sum(). But it doesn’t help. Results are pretty much the same.
I’m gathering data with regular intervals of 30 seconds. And I’ve also tried to change $__interval to various values, from 1s to 15m. There always is some unexpected value on graphs.

Yes, this approach is fragile. It depends heavily on data points from each poll cycle been within the same aggregation interval. Then with $__interval == 30s sum will actually take a sum of last counters.
If you poll with 30s period but timestamps of interfaces counters in database are not aligned but spread over this 30s period you are out of luck - the sum may:

  • take last counter value for one interface and previous counter for another interface resulting in underestimated value.
  • take 2 counters of one interface and 1 counter for another and produce a huge spike in graph

So, may try to align timestamps at 30s boundaries when data is put into db.
Another way is to use influxdb subquery to first calculate per interface rate with non_negative_derivative(last(value),1s) functions (and group by time(30s),“column”) and then use sum function in outer query. Use “raw” mode in query editor in Grafana.

2 Likes

Well, you gave me an idea about using last() to take last metrics, and then calculate real value.
I used that idea to create custom collector for telegraf to calculate this data, and well, it seems to be working fine.
And I have this data separated, which is also a plus in my case.
I’m not sure if this is the best approach possible, but it works, and I will take it.
Thanks! :smiley:

I have some problem while plotting the interface stats… while i use Tranform as Non negative derivative it gives me negative Y Axis…

Data Source : Graphite.

After a long debugging we found that dashed line comes up every time interval in graphite composer. We suspect that data is not been polled correctly… Can anybody help please

help with Graphite server polling? Graphite does not poll you send data you graphite (via carbon)

Thanks Torkel. Can you help me to give a query for graphite datasource where i can derive Total count for the Hosts which are down?

For eg. SQL query

select count(*) from tablename;

hm… searched the graphite functions docs for the word count, and found this function, countSeries, looks like its what you want paired with one of the series filtering functions (removeAboveXX removeBelowXX)

http://graphite.readthedocs.io/en/latest/functions.html#graphite.render.functions.countSeries