Power consumption per day with non equidistant data points

Hello

I try to get a “power consumption per day” and I am aware that there are a lot of topics dealing with that problem in general. But I think my problem has some “special” to it that none of the answers I read dealt with. At least as far as I dig into those postings - If I missed something to read please be kind with me :slight_smile: .

The basis of the data I write into an influxdb comes from a power meter that writes 1000 impulses for 1kWh. This is calculated into a current power in Watts and written into an Influxdb.
The “special” is that the data points are not equidistant in time.

So to calculate a power consumption per day I would need to have the power from one data point and the time distance from my current point to the next.
With this information an consumed energy in kWh can be calculated for a certain period of time.
This should then be repeated for every datapoint-pair per day and the sum of all this would lead to the total energy of a day.

This is what I would do to calculate it via some sort of software algorithm.
But, to be honest, to “translate” this into an InflxuDB Query is rather hard for me. Regarding Grafana and InfluxDB I am rather a newbie.
I guess I would have a glue how to do it with sum() and aggregate functions if the data points would be equal in time distance. But with unequal points its hard I think.

Does somebody have an idea how to solve this?
Every hint and suggestion is welcome!

Welcome @monstereye

Do you have the ability to change from InfluxQL to Flux? This sort of thing is much easier with Flux. If you post a sample of your data, me or someone can probably work out the Flux query to try out some suggested solutions.

Yes I can also use Flux language.
But since my grafana still states that the Flux implementation is still in beta status that was not my first choice.

I have a functional Flux language access to my influxdb in grafana.

Ok and here follows a sample of the raw data (exceprt of 1h of data):

I just attached a screenshot from influxdb dashboad itself (upload of raw data e.g. as csv is not possible because attachement file type is not supported).

The measurement is energie2.
I am interested in the _value column and the time range (_time) as described above.

The _start and _stop columns can be ignored here. They are filled with the current time at running the query (_start) and with current time + time offset the query has and have nothing to do with stored data (as far as I can see it).

As you can see in the _time column the points are not equidistant.
Even with the mean() function…

The value is the current power in kW during the time range from current point to next point. Name, category, etc. are not of interest and are logged for information only.
Bucket name is loxone/autogen.

Many thanks for your help in advance!

@monstereye

Hopefully this will get you started. Here is some fake data where I made up the “values” to mimic what you had:

Gven the above, we first want to know the difference in subsequent values. I used the difference() function.

from(bucket: "bucket_z")
  |> range(start: v.timeRangeStart, stop: v.timeRangeStop)
  |> filter(fn: (r) => r["_measurement"] == "energie2")
  |> filter(fn: (r) => r["_field"] == "value")
  |> filter(fn: (r) => r["category"] == "Energie")
  |> filter(fn: (r) => r["name"] == "Verbrauchszähler")
  |> difference()
  |> drop(columns: ["_start", "_stop"])
  |> yield(name: "mean")

which gives us this:

To get the elapsed time between each timestamp, I used the elapsed() function, as well as a map function to convert it to a float (I was getting an error otherwise when you try do math on it with another float).

from(bucket: "bucket_z")
  |> range(start: v.timeRangeStart, stop: v.timeRangeStop)
  |> filter(fn: (r) => r["_measurement"] == "energie2")
  |> filter(fn: (r) => r["_field"] == "value")
  |> filter(fn: (r) => r["category"] == "Energie")
  |> filter(fn: (r) => r["name"] == "Verbrauchszähler")
  |> difference()
  |> elapsed() 
  |> map(fn: (r) => ({r with elapsed: float(v: r.elapsed)}))
  |> drop(columns: ["_start", "_stop"])
  |> yield(name: "mean")

Finally, I used another map() function to multiply the _value by the elapsed time.

from(bucket: "bucket_z")
  |> range(start: v.timeRangeStart, stop: v.timeRangeStop)
  |> filter(fn: (r) => r["_measurement"] == "energie2")
  |> filter(fn: (r) => r["_field"] == "value")
  |> filter(fn: (r) => r["category"] == "Energie")
  |> filter(fn: (r) => r["name"] == "Verbrauchszähler")
  |> difference()
  |> elapsed()
  |> map(fn: (r) => ({r with elapsed: float(v: r.elapsed)}))
  |> map(fn: (r) => ({ r with newColumn: r._value * r.elapsed }))
  |> drop(columns: ["_start", "_stop"])
  |> yield(name: "mean")

which gives this:

Wow what a detailed answer. Many thanks for that!! It is a lot of help for me.

I think I understood what you did. And have some questions/comments.

I don’t think that the difference function is what is needed here. Because it only gets you the difference to the value before the current value and can be negative which makes no sense.
E.g. if the power of the current data point is less than from the point before it means that the power consumption is less during a time invervall than it was during the time interval before. It does not mean that I produced some energy :wink:

Ok I tried now the following

from(bucket: "loxone/autogen")
  |> range(start: v.timeRangeStart, stop: v.timeRangeStop)
  |> filter(fn: (r) => r["_measurement"] == "energie2")
  |> filter(fn: (r) => r["_field"] == "value")
  |> filter(fn: (r) => r["category"] == "Energie")
  |> filter(fn: (r) => r["name"] == "Verbrauchszähler")
  |> elapsed(unit: 1s)
  |> map(fn: (r) => ({ r with elapsed: float(v: r.elapsed)}))
  |> map(fn: (r) => ({ r with power_Ws: r._value * r.elapsed }))
 
  |> map(fn: (r) => ({ r with power_kWh: r.power_Ws / 3600000.0}))
  
  |> drop(columns: ["_start", "_stop","category", "name", "room", "state", "type", "_field", "_measurement"])
  |> yield(name: "mean")

What I did is to take the value a datapoint has. Multiply it with the amounts of seconds the time difference gives (with elapsed function) and then, for this period of time, the consumed energy is calculated (in Watt-seconds Ws). To get kWh simply divide this by (60601000).

But then another question arise:
I want to have an overview of the consumed energy per day, month and year.
To begin with a day (as originally questioned):
I have to specify the “day” time range and summarize all the kWh within a day (beginning at 0:00am and end with 0:00pm).
How can I do this? This should not be the time range given at the start of the Flux expression with range(start: v.timeRangeStart, stop: v.timeRangeStop) because this is the intervall I want to see the data within. E.g.: I want to see for the last 30 days (the last month) the power consumption per day.
This means range(start: v.timeRangeStart, stop: v.timeRangeStop) is the 30 days/1 month.
And NOT the single per day range.

So big question:
How to summarize within a certain time range which differ from the visible time range of the plot?

I tried something with windowing (and I think I’m on the right track - hopefully :-))

from(bucket: "loxone/autogen")
  |> range(start: v.timeRangeStart, stop: v.timeRangeStop)
  |> filter(fn: (r) => r["_measurement"] == "energie2")
  |> filter(fn: (r) => r["_field"] == "value")
  |> filter(fn: (r) => r["category"] == "Energie")
  |> filter(fn: (r) => r["name"] == "Verbrauchszähler")
  |> elapsed(unit: 1s)
  |> map(fn: (r) => ({ r with elapsed: float(v: r.elapsed)}))
  |> map(fn: (r) => ({ r with power_Ws: r._value * r.elapsed }))
 
  |> map(fn: (r) => ({ r with power_kWh: r.power_Ws / 3600.0}))
  |> window(every: 1d) 
  |> sum(column: "power_kWh")
   |> drop(columns: ["_start", "_stop","category", "name", "room", "state", "type", "_field", "_measurement"])

This seems to work (in influx panel direktly with table output) except that I get some half days at beginning and end) but if I try this now in grafanas time series panel it tells me that the data has now time field (and this is correct).
But the time field gets lost somewhere in between…
Therefore this cannot be the last solution.
Hopefully @grant2 you have an idea?

@monstereye

Perhaps try replacing this:

 |> window(every: 1d)

with this:

 |> aggregateWindow(every: 1d, fn: sum)

Does that make any difference?

Works like a charm :slight_smile:
Is this even possible with standard queries?
This is a good starting point to learn a little bit more about Flux language.
Thanks in any way for you support!

OK to help others here comes the Flux-Expression in all its beautiy

from(bucket: "my_bucket")
  |> range(start: v.timeRangeStart, stop: v.timeRangeStop)
  |> filter(fn: (r) => r["_measurement"] == "energie2")
  |> filter(fn: (r) => r["_field"] == "value")
  |> filter(fn: (r) => r["category"] == "Energie")
  |> filter(fn: (r) => r["name"] == "Verbrauchszähler")
  |> elapsed(unit: 1s)
  |> map(fn: (r) => ({ r with elapsed: float(v: r.elapsed)}))
  |> map(fn: (r) => ({ r with power_Ws: r._value * r.elapsed }))
 
  |> map(fn: (r) => ({ r with power_kWh: r.power_Ws / 3600.0}))
  |> aggregateWindow(every: 1d, fn: sum, column: "power_kWh")

  |> drop(columns: ["_start", "_stop","category", "name", "room", "state", "type", "_field", "_measurement"])

Two things to mention:

  1. For unknown reason I had to remove the division by 1000 to get values in expected range.
  2. I still have to do a double check by hand to see whether it works really as expected.
    e.g.: The elapse() function stores the values in a certain row. Question of double check is now wheter I use then really the correct values or if there is a missmatch of one line. This would result in multipling the wrong time duration with the power level… Since power usually behaves in a continuous way this should not be a great error …

In any case: Many thanks for you support. I hope this helps others.

3 Likes

As already discussed in speed up long running query this query is rather slow.

My environment means that influxdb is running on a Raspberry Pi 4b only which is not optimal.
My powermeter produces roughly around 10000 data points per day. Therefore summarizing needs a lot of time.

I prefer now to do this “on the fly” with a Task that runs every day.
To handle already written points (sum up the energy consumption per day) I run over another problem with the query that I don’t want to be unmentioned (for others how will read this thread):

The problem is located in the elapsed function.
Elapsed is calculating the time difference between the current and last data point and writes it into the row of the current datapoint. This means it “looks” back and not forward which is important for energy calculation.

Calculation of the used energy means the following:
The power meter generates 1000 pulses for every 1kWh. In my case I have a calculation done in my home automation that calculates from pulses the power from pulse to pulse. If the power changes (and it does with every incomming pulse) this means that this is the power my house needs at the moment (up to the next pulse).

This means that I would rather need an elapsed function that looks foreward, e.g. that calculates the difference between two rows and stores it at the first row. This woul mean not to omit the first row and to omit the last row.

Therefore I used for my task queries the events.duration function now, see:
https://docs.influxdata.com/flux/v0.x/stdlib/contrib/tomhollingworth/events/duration/

for details of the query pls. have a look in thread mentioned above.

Great explanation. Thanks for sharing.

Hopefully it is of help for somebody … :slight_smile:

Calculating used/produced energy from power flow is not so easy as it sound if you dig into the details … Thought it would be easier :woozy_face: