Influx - drop the first and last rows/values from the table

Is there an easy way to drop the first and last values/rows?

I was thinking something along the lines of this:

|> filter(fn: (r) => r != first() and r != last()) 
1 Like

Which task are you trying to solve?

Here is a simple code snippet for filtering out first and last records:

import "sampledata"

data = sampledata.int()

first = data
    |> first()
    |> tableFind(fn: (key) => key.tag == "t1")
    |> getRecord(idx: 0)

last = data
    |> last()
    |> tableFind(fn: (key) => key.tag == "t1")
    |> getRecord(idx: 0)

data
  |> filter(fn: (r) => r._value != first._value and r._value != last._value)    

It’s a bit related to another issue I was trying to solve: InfluxDB sum of all values - #3 by jjm

I wanted to sum all the values, but the inconsistent times were causing issues.

Through lots of trial and errors, I finally got it working consistently well with this:

option minWindowPeriod = 2m

from(bucket: "bucket1")
  |> range(start: v.timeRangeStart, stop: v.timeRangeStop)
  |> filter(fn: (r) => r["_measurement"] == "Cisco-IOS-XR-sysadmin-fretta-envmon-ui:environment/oper/power/location/pem_attributes")
  |> filter(fn: (r) => r["_field"] == "system_power_input" and r["_value"] < 1000)
  |> group(columns: ["source"])
  |> window(every: if float(v: int(v: v.windowPeriod) * 2) < float(v: int(v: minWindowPeriod)) then minWindowPeriod else duration(v: int(v: v.windowPeriod) * 2))
  |> mean()
  |> duplicate(column: "_stop", as: "_time")
  |> window(every: inf)
  |> group(columns: ["_time"])
  |> sum(column: "_value")
  |> duplicate(column: "_time", as: "_stop")
  |> duplicate(column: "_time", as: "_start")  
  |> group()
  |> yield(name: "power")

The only issue is that the first and last value may not end up with values from all devices, making the first and last sum potentially incorrect.

I modified the query based on your example:

option minWindowPeriod = 2m

power = from(bucket: "bucket1")
  |> range(start: v.timeRangeStart, stop: v.timeRangeStop)
  |> filter(fn: (r) => r["_measurement"] == "Cisco-IOS-XR-sysadmin-fretta-envmon-ui:environment/oper/power/location/pem_attributes")
  |> filter(fn: (r) => r["_field"] == "system_power_input" and r["_value"] < 1000)
  |> group(columns: ["source"])
  |> window(every: if float(v: int(v: v.windowPeriod) * 2) < float(v: int(v: minWindowPeriod)) then minWindowPeriod else duration(v: int(v: v.windowPeriod) * 2))
  |> mean()
  |> duplicate(column: "_stop", as: "_time")
  |> window(every: inf)
  |> group(columns: ["_time"])
  |> sum(column: "_value")
  |> duplicate(column: "_time", as: "_stop")
  |> duplicate(column: "_time", as: "_start")  
  |> group()

first = power
    |> first()
    |> tableFind(fn: (key) => key.tag == "t1")
    |> getRecord(idx: 0)

last = power
    |> last()
    |> tableFind(fn: (key) => key.tag == "t1")
    |> getRecord(idx: 0)

power
  |> filter(fn: (r) => r._value != first._value and r._value != last._value) 
  |> yield(name: "power")

But I’m getting an error:

 error calling function "tableFind" @20:8-20:47: no table found

My guess is I modified the default influx table in such a way, that this doesn’t work anymore.

Here’s a CSV of the my original query as an example of what the table looks like.

#group,false,false,false,false,false,false
#datatype,string,long,dateTime:RFC3339,double,dateTime:RFC3339,dateTime:RFC3339
#default,power,,,,,
,result,table,_time,_value,_stop,_start
,,0,2024-11-27T13:50:00Z,4092,2024-11-27T13:50:00Z,2024-11-27T13:50:00Z
,,0,2024-11-27T13:52:00Z,18499.5,2024-11-27T13:52:00Z,2024-11-27T13:52:00Z
,,0,2024-11-27T13:54:00Z,18502.5,2024-11-27T13:54:00Z,2024-11-27T13:54:00Z
,,0,2024-11-27T13:56:00Z,18504,2024-11-27T13:56:00Z,2024-11-27T13:56:00Z
,,0,2024-11-27T13:58:00Z,18507.5,2024-11-27T13:58:00Z,2024-11-27T13:58:00Z
,,0,2024-11-27T14:00:00Z,18505.5,2024-11-27T14:00:00Z,2024-11-27T14:00:00Z
,,0,2024-11-27T14:02:00Z,18505.5,2024-11-27T14:02:00Z,2024-11-27T14:02:00Z
,,0,2024-11-27T14:04:00Z,18502,2024-11-27T14:04:00Z,2024-11-27T14:04:00Z
,,0,2024-11-27T14:06:00Z,18504.5,2024-11-27T14:06:00Z,2024-11-27T14:06:00Z
,,0,2024-11-27T14:08:00Z,18502,2024-11-27T14:08:00Z,2024-11-27T14:08:00Z
,,0,2024-11-27T14:10:00Z,18506.5,2024-11-27T14:10:00Z,2024-11-27T14:10:00Z
,,0,2024-11-27T14:12:00Z,18500.5,2024-11-27T14:12:00Z,2024-11-27T14:12:00Z
,,0,2024-11-27T14:14:00Z,18502,2024-11-27T14:14:00Z,2024-11-27T14:14:00Z
,,0,2024-11-27T14:16:00Z,18502.5,2024-11-27T14:16:00Z,2024-11-27T14:16:00Z
,,0,2024-11-27T14:18:00Z,18497,2024-11-27T14:18:00Z,2024-11-27T14:18:00Z
,,0,2024-11-27T14:20:00Z,18494.5,2024-11-27T14:20:00Z,2024-11-27T14:20:00Z
,,0,2024-11-27T14:22:00Z,18495,2024-11-27T14:22:00Z,2024-11-27T14:22:00Z
,,0,2024-11-27T14:24:00Z,18497,2024-11-27T14:24:00Z,2024-11-27T14:24:00Z
,,0,2024-11-27T14:26:00Z,18498.5,2024-11-27T14:26:00Z,2024-11-27T14:26:00Z
,,0,2024-11-27T14:28:00Z,18500.5,2024-11-27T14:28:00Z,2024-11-27T14:28:00Z
,,0,2024-11-27T14:30:00Z,18498,2024-11-27T14:30:00Z,2024-11-27T14:30:00Z
,,0,2024-11-27T14:32:00Z,18497.5,2024-11-27T14:32:00Z,2024-11-27T14:32:00Z
,,0,2024-11-27T14:34:00Z,18501.5,2024-11-27T14:34:00Z,2024-11-27T14:34:00Z
,,0,2024-11-27T14:36:00Z,18505.5,2024-11-27T14:36:00Z,2024-11-27T14:36:00Z
,,0,2024-11-27T14:38:00Z,18509,2024-11-27T14:38:00Z,2024-11-27T14:38:00Z
,,0,2024-11-27T14:40:00Z,18505.5,2024-11-27T14:40:00Z,2024-11-27T14:40:00Z
,,0,2024-11-27T14:42:00Z,18501.5,2024-11-27T14:42:00Z,2024-11-27T14:42:00Z
,,0,2024-11-27T14:44:00Z,18499,2024-11-27T14:44:00Z,2024-11-27T14:44:00Z
,,0,2024-11-27T14:46:00Z,18502,2024-11-27T14:46:00Z,2024-11-27T14:46:00Z
,,0,2024-11-27T14:48:00Z,18503,2024-11-27T14:48:00Z,2024-11-27T14:48:00Z
,,0,2024-11-27T14:49:52.671401559Z,18503.5,2024-11-27T14:49:52.671401559Z,2024-11-27T14:49:52.671401559Z

As you can see the values are more or less consistent through time, as they should, but the first if widely off and at the times, the last can be as well. Best way to solve it would be to simply drop them from the table.

This topic seems to be discussing the exact same problem: Removing first & last data points

Ok, I think I’ve finally got something that works.

rowCount = data
  |> count(column: "_value") // Count total rows
  |> findRecord(fn: (key) => true, idx: 0) // Extract row count as a record

data
  |> tail(offset: 1, n: int(v: rowCount._value) - 2) // Remove the first and last row

The whole query then looks like this:

option minWindowPeriod = 2m

data = from(bucket: "bucket1")
  |> range(start: v.timeRangeStart, stop: v.timeRangeStop)
  |> filter(fn: (r) => r["_measurement"] == "Cisco-IOS-XR-sysadmin-fretta-envmon-ui:environment/oper/power/location/pem_attributes")
  |> filter(fn: (r) => r["_field"] == "system_power_input" and r["_value"] < 1000)
  |> group(columns: ["source"])
  |> window(every: if float(v: int(v: v.windowPeriod) * 2) < float(v: int(v: minWindowPeriod)) then minWindowPeriod else duration(v: int(v: v.windowPeriod) * 2))
  |> mean()
  |> duplicate(column: "_stop", as: "_time")
  |> window(every: inf)
  |> group(columns: ["_time"])
  |> sum(column: "_value")
  |> duplicate(column: "_time", as: "_stop")
  |> duplicate(column: "_time", as: "_start")  
  |> group()

rowCount = data
  |> count(column: "_value") // Count total rows
  |> findRecord(fn: (key) => true, idx: 0) // Extract row count as a record

data
  |> tail(offset: 1, n: int(v: rowCount._value) - 2) // Remove the first and last row
  |> yield(name: "power")