Optimize a long query with influxdb

hhoe · December 6, 2023, 12:32pm

Hello!

I´m using:
Grafana
InfluxDB: v2.7.3

I like to display the consumption of each month from my smartmeter:

import "timezone"
import "date"
option location = timezone.location(name: "Europe/Vienna")
startDate = date.truncate(t: today(), unit: 1y)

from(bucket: "home")
  |> range(start: startDate, stop: now())
  |> filter(fn: (r) => r["_measurement"] == "PV")
  |> filter(fn: (r) => r["Powermeter"] == "Hauptzaehler")
  |> filter(fn: (r) => r["_field"] == "Total Geliefert" or r["_field"] == "Totaler Bezug")
  |> aggregateWindow(every: 1d, fn: last, createEmpty: false)
  |> filter(fn: (r) => r["_value"] > 0)
  |> difference(nonNegative: true)
  |> aggregateWindow(every: 1mo, fn: sum, createEmpty: false)
  |> timeShift(duration: -1s, columns: ["_time"])
  |> yield()

After a loooong waiting time, I get the result:

Is my query very slow?

grant2 · December 6, 2023, 10:40pm

Hi @hhoe

Is your electricity meter value always increasing? If yes, perhaps using the spread function will be more efficient? Something like this:

import "timezone"
import "date"
option location = timezone.location(name: "Europe/Vienna")
startDate = date.truncate(t: today(), unit: 1y)

from(bucket: "home")
  |> range(start: startDate, stop: now())
  |> filter(fn: (r) => r["_measurement"] == "PV")
  |> filter(fn: (r) => r["Powermeter"] == "Hauptzaehler")
  |> filter(fn: (r) => r["_field"] == "Total Geliefert" or r["_field"] == "Totaler Bezug")
  |> aggregateWindow(every: 1mo, fn: spread, createEmpty: false)
  |> timeShift(duration: -1s, columns: ["_time"])
  |> yield()

hhoe · December 7, 2023, 6:14am

Thank you @grant2

Yes, its always increasing. But this dont work:

I`m thinking also about this direction:

import "timezone"
import "date"
option location = timezone.location(name: "Europe/Vienna")
startDate = date.truncate(t: today(), unit: 1y)

seven = from(bucket: "house")
  |> range(start: date.add(d: 6mo, to: startDate), stop: date.add(d: 1d, to: date.add(d: 6mo, to: startDate)))
  |> filter(fn: (r) => r["_measurement"] == "PV")
  |> filter(fn: (r) => r["Powermeter"] == "Hauptzaehler")
  |> filter(fn: (r) => r["_field"] == "Total Geliefert" or r["_field"] == "Totaler Bezug")
  |> filter(fn: (r) => r["_value"] > 0)
  |> first()
  |> timeShift(duration: -1s, columns: ["_time"])

eight = from(bucket: "house")
  |> range(start: date.add(d: 7mo, to: startDate), stop: date.add(d: 1d, to: date.add(d: 7mo, to: startDate)))
  |> filter(fn: (r) => r["_measurement"] == "PV")
  |> filter(fn: (r) => r["Powermeter"] == "Hauptzaehler")
  |> filter(fn: (r) => r["_field"] == "Total Geliefert" or r["_field"] == "Totaler Bezug")
  |> filter(fn: (r) => r["_value"] > 0)
  |> first()
  |> timeShift(duration: -1s, columns: ["_time"])

nine = from(bucket: "house")
  |> range(start: date.add(d: 8mo, to: startDate), stop: date.add(d: 1d, to: date.add(d: 8mo, to: startDate)))
  |> filter(fn: (r) => r["_measurement"] == "PV")
  |> filter(fn: (r) => r["Powermeter"] == "Hauptzaehler")
  |> filter(fn: (r) => r["_field"] == "Total Geliefert" or r["_field"] == "Totaler Bezug")
  |> filter(fn: (r) => r["_value"] > 0)
  |> first()
  |> timeShift(duration: -1s, columns: ["_time"])

sum = union(tables: [seven, eight, nine])
    |> pivot(rowKey: ["_time"], columnKey: ["_field"], valueColumn: "_value")

sum
//  |> difference(nonNegative: false)
  |> yield()

But it seems he don´t combine the 3 queries correct to one result.

I just need always the value closest to the first or last in a month

grant2 · December 7, 2023, 10:19am

Hi @hhoe

I feel this can still be done in one query. Can you try this?

import "timezone"
import "date"
option location = timezone.location(name: "Europe/Vienna")
startDate = date.truncate(t: today(), unit: 1y)

from(bucket: "home")
  |> range(start: startDate, stop: now())
  |> filter(fn: (r) => r["_measurement"] == "PV")
  |> filter(fn: (r) => r["Powermeter"] == "Hauptzaehler")
  |> filter(fn: (r) => r["_field"] == "Total Geliefert" or r["_field"] == "Totaler Bezug")
  |> aggregateWindow(every: 1mo, fn: last, createEmpty: false)
  |> filter(fn: (r) => r["_value"] > 0)
  |> difference(nonNegative: true)
  |> timeShift(duration: -1s, columns: ["_time"])
  |> yield()

hhoe · December 7, 2023, 10:25am

Thank you @grant2

This is similar my first post and the query takes very long.

grant2 · December 7, 2023, 10:27am

Yeah, I thought that maybe having one aggregateWindow function (instead of two) might speed up your query. I wish there was a way to show what in your query is causing it to take so long.

Are there any other tags in your data that can be filtered so that the dataset is smaller before running the aggregateWindow funcdtion.

hhoe · December 7, 2023, 1:21pm

I tried now to read all data for 1 year. Data available from April.
He read the data withing 1s!

When I use:
|> aggregateWindow(every: 1d, fn: mean, createEmpty: false)
he need around 1s

When I use:
|> aggregateWindow(every: 1d, fn: last, createEmpty: false)
he need around 10s

Did you know why last has this delay?
With mean ne need to read all values and calc the mean
With last or first he “just” need the first value

Why is mean so fast?

grant2 · December 7, 2023, 1:34pm

That is puzzling! Maybe ask this question in the Influx forum if you have not already done so.

I saw you had posted already there, but this specific question might lead to the answer as to why the query takes so long.

yosiasz · December 14, 2023, 8:26am

Is Powermeter a field ir a tag?
Very important read

Topic		Replies	Views
Query Timeout InfluxDB on 3Month Power Calculation InfluxDB	14	1177	April 24, 2023
New to Influxdb Grafana query InfluxDB influxdb , query-help	24	1986	March 10, 2023
Speed up long running Flux Query InfluxDB influxdb , query-help	16	4515	December 11, 2022
Help with processing consumption data for Dashboard InfluxDB grafana-cloud , dashboard	10	324	February 23, 2024
InfluxDb Query in Grafana is taking time to get result using filter Dashboards	0	75	April 8, 2024

Optimize a long query with influxdb

Related topics