Is there an easy way to drop the first and last values/rows?
I was thinking something along the lines of this:
|> filter(fn: (r) => r != first() and r != last())
Is there an easy way to drop the first and last values/rows?
I was thinking something along the lines of this:
|> filter(fn: (r) => r != first() and r != last())
Which task are you trying to solve?
Here is a simple code snippet for filtering out first and last records:
import "sampledata"
data = sampledata.int()
first = data
|> first()
|> tableFind(fn: (key) => key.tag == "t1")
|> getRecord(idx: 0)
last = data
|> last()
|> tableFind(fn: (key) => key.tag == "t1")
|> getRecord(idx: 0)
data
|> filter(fn: (r) => r._value != first._value and r._value != last._value)
It’s a bit related to another issue I was trying to solve: InfluxDB sum of all values - #3 by jjm
I wanted to sum all the values, but the inconsistent times were causing issues.
Through lots of trial and errors, I finally got it working consistently well with this:
option minWindowPeriod = 2m
from(bucket: "bucket1")
|> range(start: v.timeRangeStart, stop: v.timeRangeStop)
|> filter(fn: (r) => r["_measurement"] == "Cisco-IOS-XR-sysadmin-fretta-envmon-ui:environment/oper/power/location/pem_attributes")
|> filter(fn: (r) => r["_field"] == "system_power_input" and r["_value"] < 1000)
|> group(columns: ["source"])
|> window(every: if float(v: int(v: v.windowPeriod) * 2) < float(v: int(v: minWindowPeriod)) then minWindowPeriod else duration(v: int(v: v.windowPeriod) * 2))
|> mean()
|> duplicate(column: "_stop", as: "_time")
|> window(every: inf)
|> group(columns: ["_time"])
|> sum(column: "_value")
|> duplicate(column: "_time", as: "_stop")
|> duplicate(column: "_time", as: "_start")
|> group()
|> yield(name: "power")
The only issue is that the first and last value may not end up with values from all devices, making the first and last sum potentially incorrect.
I modified the query based on your example:
option minWindowPeriod = 2m
power = from(bucket: "bucket1")
|> range(start: v.timeRangeStart, stop: v.timeRangeStop)
|> filter(fn: (r) => r["_measurement"] == "Cisco-IOS-XR-sysadmin-fretta-envmon-ui:environment/oper/power/location/pem_attributes")
|> filter(fn: (r) => r["_field"] == "system_power_input" and r["_value"] < 1000)
|> group(columns: ["source"])
|> window(every: if float(v: int(v: v.windowPeriod) * 2) < float(v: int(v: minWindowPeriod)) then minWindowPeriod else duration(v: int(v: v.windowPeriod) * 2))
|> mean()
|> duplicate(column: "_stop", as: "_time")
|> window(every: inf)
|> group(columns: ["_time"])
|> sum(column: "_value")
|> duplicate(column: "_time", as: "_stop")
|> duplicate(column: "_time", as: "_start")
|> group()
first = power
|> first()
|> tableFind(fn: (key) => key.tag == "t1")
|> getRecord(idx: 0)
last = power
|> last()
|> tableFind(fn: (key) => key.tag == "t1")
|> getRecord(idx: 0)
power
|> filter(fn: (r) => r._value != first._value and r._value != last._value)
|> yield(name: "power")
But I’m getting an error:
error calling function "tableFind" @20:8-20:47: no table found
My guess is I modified the default influx table in such a way, that this doesn’t work anymore.
Here’s a CSV of the my original query as an example of what the table looks like.
#group,false,false,false,false,false,false
#datatype,string,long,dateTime:RFC3339,double,dateTime:RFC3339,dateTime:RFC3339
#default,power,,,,,
,result,table,_time,_value,_stop,_start
,,0,2024-11-27T13:50:00Z,4092,2024-11-27T13:50:00Z,2024-11-27T13:50:00Z
,,0,2024-11-27T13:52:00Z,18499.5,2024-11-27T13:52:00Z,2024-11-27T13:52:00Z
,,0,2024-11-27T13:54:00Z,18502.5,2024-11-27T13:54:00Z,2024-11-27T13:54:00Z
,,0,2024-11-27T13:56:00Z,18504,2024-11-27T13:56:00Z,2024-11-27T13:56:00Z
,,0,2024-11-27T13:58:00Z,18507.5,2024-11-27T13:58:00Z,2024-11-27T13:58:00Z
,,0,2024-11-27T14:00:00Z,18505.5,2024-11-27T14:00:00Z,2024-11-27T14:00:00Z
,,0,2024-11-27T14:02:00Z,18505.5,2024-11-27T14:02:00Z,2024-11-27T14:02:00Z
,,0,2024-11-27T14:04:00Z,18502,2024-11-27T14:04:00Z,2024-11-27T14:04:00Z
,,0,2024-11-27T14:06:00Z,18504.5,2024-11-27T14:06:00Z,2024-11-27T14:06:00Z
,,0,2024-11-27T14:08:00Z,18502,2024-11-27T14:08:00Z,2024-11-27T14:08:00Z
,,0,2024-11-27T14:10:00Z,18506.5,2024-11-27T14:10:00Z,2024-11-27T14:10:00Z
,,0,2024-11-27T14:12:00Z,18500.5,2024-11-27T14:12:00Z,2024-11-27T14:12:00Z
,,0,2024-11-27T14:14:00Z,18502,2024-11-27T14:14:00Z,2024-11-27T14:14:00Z
,,0,2024-11-27T14:16:00Z,18502.5,2024-11-27T14:16:00Z,2024-11-27T14:16:00Z
,,0,2024-11-27T14:18:00Z,18497,2024-11-27T14:18:00Z,2024-11-27T14:18:00Z
,,0,2024-11-27T14:20:00Z,18494.5,2024-11-27T14:20:00Z,2024-11-27T14:20:00Z
,,0,2024-11-27T14:22:00Z,18495,2024-11-27T14:22:00Z,2024-11-27T14:22:00Z
,,0,2024-11-27T14:24:00Z,18497,2024-11-27T14:24:00Z,2024-11-27T14:24:00Z
,,0,2024-11-27T14:26:00Z,18498.5,2024-11-27T14:26:00Z,2024-11-27T14:26:00Z
,,0,2024-11-27T14:28:00Z,18500.5,2024-11-27T14:28:00Z,2024-11-27T14:28:00Z
,,0,2024-11-27T14:30:00Z,18498,2024-11-27T14:30:00Z,2024-11-27T14:30:00Z
,,0,2024-11-27T14:32:00Z,18497.5,2024-11-27T14:32:00Z,2024-11-27T14:32:00Z
,,0,2024-11-27T14:34:00Z,18501.5,2024-11-27T14:34:00Z,2024-11-27T14:34:00Z
,,0,2024-11-27T14:36:00Z,18505.5,2024-11-27T14:36:00Z,2024-11-27T14:36:00Z
,,0,2024-11-27T14:38:00Z,18509,2024-11-27T14:38:00Z,2024-11-27T14:38:00Z
,,0,2024-11-27T14:40:00Z,18505.5,2024-11-27T14:40:00Z,2024-11-27T14:40:00Z
,,0,2024-11-27T14:42:00Z,18501.5,2024-11-27T14:42:00Z,2024-11-27T14:42:00Z
,,0,2024-11-27T14:44:00Z,18499,2024-11-27T14:44:00Z,2024-11-27T14:44:00Z
,,0,2024-11-27T14:46:00Z,18502,2024-11-27T14:46:00Z,2024-11-27T14:46:00Z
,,0,2024-11-27T14:48:00Z,18503,2024-11-27T14:48:00Z,2024-11-27T14:48:00Z
,,0,2024-11-27T14:49:52.671401559Z,18503.5,2024-11-27T14:49:52.671401559Z,2024-11-27T14:49:52.671401559Z
As you can see the values are more or less consistent through time, as they should, but the first if widely off and at the times, the last can be as well. Best way to solve it would be to simply drop them from the table.
This topic seems to be discussing the exact same problem: Removing first & last data points
Ok, I think I’ve finally got something that works.
rowCount = data
|> count(column: "_value") // Count total rows
|> findRecord(fn: (key) => true, idx: 0) // Extract row count as a record
data
|> tail(offset: 1, n: int(v: rowCount._value) - 2) // Remove the first and last row
The whole query then looks like this:
option minWindowPeriod = 2m
data = from(bucket: "bucket1")
|> range(start: v.timeRangeStart, stop: v.timeRangeStop)
|> filter(fn: (r) => r["_measurement"] == "Cisco-IOS-XR-sysadmin-fretta-envmon-ui:environment/oper/power/location/pem_attributes")
|> filter(fn: (r) => r["_field"] == "system_power_input" and r["_value"] < 1000)
|> group(columns: ["source"])
|> window(every: if float(v: int(v: v.windowPeriod) * 2) < float(v: int(v: minWindowPeriod)) then minWindowPeriod else duration(v: int(v: v.windowPeriod) * 2))
|> mean()
|> duplicate(column: "_stop", as: "_time")
|> window(every: inf)
|> group(columns: ["_time"])
|> sum(column: "_value")
|> duplicate(column: "_time", as: "_stop")
|> duplicate(column: "_time", as: "_start")
|> group()
rowCount = data
|> count(column: "_value") // Count total rows
|> findRecord(fn: (key) => true, idx: 0) // Extract row count as a record
data
|> tail(offset: 1, n: int(v: rowCount._value) - 2) // Remove the first and last row
|> yield(name: "power")