Optimize a long query with influxdb

Hello!

I´m using:
Grafana
InfluxDB: v2.7.3

I like to display the consumption of each month from my smartmeter:

import "timezone"
import "date"
option location = timezone.location(name: "Europe/Vienna")
startDate = date.truncate(t: today(), unit: 1y)

from(bucket: "home")
  |> range(start: startDate, stop: now())
  |> filter(fn: (r) => r["_measurement"] == "PV")
  |> filter(fn: (r) => r["Powermeter"] == "Hauptzaehler")
  |> filter(fn: (r) => r["_field"] == "Total Geliefert" or r["_field"] == "Totaler Bezug")
  |> aggregateWindow(every: 1d, fn: last, createEmpty: false)
  |> filter(fn: (r) => r["_value"] > 0)
  |> difference(nonNegative: true)
  |> aggregateWindow(every: 1mo, fn: sum, createEmpty: false)
  |> timeShift(duration: -1s, columns: ["_time"])
  |> yield()

After a loooong waiting time, I get the result:

Is my query very slow?

Hi @hhoe

Is your electricity meter value always increasing? If yes, perhaps using the spread function will be more efficient? Something like this:

import "timezone"
import "date"
option location = timezone.location(name: "Europe/Vienna")
startDate = date.truncate(t: today(), unit: 1y)

from(bucket: "home")
  |> range(start: startDate, stop: now())
  |> filter(fn: (r) => r["_measurement"] == "PV")
  |> filter(fn: (r) => r["Powermeter"] == "Hauptzaehler")
  |> filter(fn: (r) => r["_field"] == "Total Geliefert" or r["_field"] == "Totaler Bezug")
  |> aggregateWindow(every: 1mo, fn: spread, createEmpty: false)
  |> timeShift(duration: -1s, columns: ["_time"])
  |> yield()

Thank you @grant2

Yes, its always increasing. But this dont work:

I`m thinking also about this direction:

import "timezone"
import "date"
option location = timezone.location(name: "Europe/Vienna")
startDate = date.truncate(t: today(), unit: 1y)

seven = from(bucket: "house")
  |> range(start: date.add(d: 6mo, to: startDate), stop: date.add(d: 1d, to: date.add(d: 6mo, to: startDate)))
  |> filter(fn: (r) => r["_measurement"] == "PV")
  |> filter(fn: (r) => r["Powermeter"] == "Hauptzaehler")
  |> filter(fn: (r) => r["_field"] == "Total Geliefert" or r["_field"] == "Totaler Bezug")
  |> filter(fn: (r) => r["_value"] > 0)
  |> first()
  |> timeShift(duration: -1s, columns: ["_time"])

eight = from(bucket: "house")
  |> range(start: date.add(d: 7mo, to: startDate), stop: date.add(d: 1d, to: date.add(d: 7mo, to: startDate)))
  |> filter(fn: (r) => r["_measurement"] == "PV")
  |> filter(fn: (r) => r["Powermeter"] == "Hauptzaehler")
  |> filter(fn: (r) => r["_field"] == "Total Geliefert" or r["_field"] == "Totaler Bezug")
  |> filter(fn: (r) => r["_value"] > 0)
  |> first()
  |> timeShift(duration: -1s, columns: ["_time"])

nine = from(bucket: "house")
  |> range(start: date.add(d: 8mo, to: startDate), stop: date.add(d: 1d, to: date.add(d: 8mo, to: startDate)))
  |> filter(fn: (r) => r["_measurement"] == "PV")
  |> filter(fn: (r) => r["Powermeter"] == "Hauptzaehler")
  |> filter(fn: (r) => r["_field"] == "Total Geliefert" or r["_field"] == "Totaler Bezug")
  |> filter(fn: (r) => r["_value"] > 0)
  |> first()
  |> timeShift(duration: -1s, columns: ["_time"])

sum = union(tables: [seven, eight, nine])
    |> pivot(rowKey: ["_time"], columnKey: ["_field"], valueColumn: "_value")

sum
//  |> difference(nonNegative: false)
  |> yield()

But it seems he don´t combine the 3 queries correct to one result.

I just need always the value closest to the first or last in a month

Hi @hhoe

I feel this can still be done in one query. Can you try this?

import "timezone"
import "date"
option location = timezone.location(name: "Europe/Vienna")
startDate = date.truncate(t: today(), unit: 1y)

from(bucket: "home")
  |> range(start: startDate, stop: now())
  |> filter(fn: (r) => r["_measurement"] == "PV")
  |> filter(fn: (r) => r["Powermeter"] == "Hauptzaehler")
  |> filter(fn: (r) => r["_field"] == "Total Geliefert" or r["_field"] == "Totaler Bezug")
  |> aggregateWindow(every: 1mo, fn: last, createEmpty: false)
  |> filter(fn: (r) => r["_value"] > 0)
  |> difference(nonNegative: true)
  |> timeShift(duration: -1s, columns: ["_time"])
  |> yield()

Thank you @grant2

This is similar my first post and the query takes very long.

Yeah, I thought that maybe having one aggregateWindow function (instead of two) might speed up your query. I wish there was a way to show what in your query is causing it to take so long.

Are there any other tags in your data that can be filtered so that the dataset is smaller before running the aggregateWindow funcdtion.

I tried now to read all data for 1 year. Data available from April.
He read the data withing 1s!

When I use:
|> aggregateWindow(every: 1d, fn: mean, createEmpty: false)
he need around 1s

When I use:
|> aggregateWindow(every: 1d, fn: last, createEmpty: false)
he need around 10s

Did you know why last has this delay?
With mean ne need to read all values and calc the mean
With last or first he “just” need the first value

Why is mean so fast?

That is puzzling! Maybe ask this question in the Influx forum if you have not already done so.

I saw you had posted already there, but this specific question might lead to the answer as to why the query takes so long.

1 Like

Is Powermeter a field ir a tag?
Very important read