The reason behind is, to calculate energy on certain time ranges if the observed power is stored with not equidistant time values.
In general it is working well. At least for short time ranges.
But if I want to evaluate a greater time range this is very! slow.
What I can think of is that my grafana panel in general requests the same functions more than one time.
E.g. It has a per day consumption panel and a per month consumption panel (and so on).
Is it possible to build up a cache automatically?
My data is written into the underlying influxdb via node red and I can also think of a data transformation to store the “time between two datapoints” * power / 3600 calculation directly into a separate tabel.
But maybe there is an intelligent solution to speed up the query?
Sure but to get an information from this one should correlate this to the amount of datapoints. Otherwise the information about needed time is rather useless.
So my “raw” data comes from a power meter that gives for every kWh 1000 pulses.
Therefore the amount of data points per time unit is dependend from my actual power consumption. And this varies over time.
I had a rough look into the datapoints and as a worst case scenario we can calculate with one point per minute (although in reality it will be most likely less than that).
OK now the evaluation of the duration:
Aggegate Window 1day, Time range to display is 1day: Immediately (milliseconds) 7days: 1s approx. 30 days: 3s approx. 90days: 8s last 6 months: 13s
What now is bothering me is that I get from last year query 3 results (colors). Therefore I attached pictures from last 6 months and last year.
Is maybe this the reason for this long running query?
For me 13s - 6 month extrapolated to around half a minute for 1 year would be ok.
So is there maybe something in my raw data that causes the problem and not within the query?
In general the problem is, that I don’t have only one of those queries
I want do display in parallel the production and consumption per day and year. This sums up to 4 panels with those queries…
Maybe too much for the raspbi 4 it runs on
In parallel I dig into node red a little bit more to maybe do the time*power during this time calculations in node red BEFORE it goes into influx. Then I could get rid of the map functions and elapse function. The already existing data I can transform with a small programm (java e.g.).
But the most “easiest” way, if it exsists, is to just do it in grafana…
I can, if necessary, also evaluate the times needed for aggegate window of one month…
The more I dig into it the more I get confused.
Ok query with map functions can be slow, thats understood.
The first thing I am confused about is that with one query I get with a certain timerange 3 results back ( as visible in image of my last entry).
This happens also directly on influxdb with much easier query (and a time range choosen such that only 2 result curves are comming back from flux query)
But a column with name “table” doesn’t exists in my data/on the influxdb itself.
Where does this comes from? I did a rough search in the help of influxdb but found nothing …
The reason for this question is that (this is my impression at least) as long as I don’t query over this weird “borders” in my data the queries are rather fast. Rather means it could be more performant but maybe this is limited by cpu power of my raspbi on which the influx db runs …
Another question is: Would it be better to ask the guys from influxdb directly and this is the wrong place to ask this question?
Welcome to the sometimes confusing world of InfluxDB! You can try posting on their forum, but this forum is more active and I think answers come quicker. Having said that, you may want to read through this to better understand the Flux table structure.
Try inserting this statement at the end of your query:
Your second question regarding “_result” and “table” columns:
This is exactly my question!
I don’t know why they are there. I think they are NOT contained on the influxdb itself (Although I am not really sure).
If I drop “_result” (as explained above) nothing happens.
If I drop “table” and “_result” then I still get two result lines back from the query.
One is showing the data, the other one a constant value…
I try to find some time over the weekend to dig more into this. Maybe with a little bit Java code and SQL Queries to see if this behaviour is reproducible there too…
@yosiasz You are right! It’s possible but maybe not advised to do so.
My original question was (since I am a newbie regarding flux language) if I can do something better to improve the query itself. If not, I will have a look for a host with more power
Nevertheless this topic has become more a discussion about weird content of query result …
But thanks for your answers!
I went back through this thread to re-familiarize myself with your situation. Did you change something back in March 2022? It would appear that your data was just slightly modified (twice) which is why it gets rendered as different series.
The group() statement directly after the filter, makes sure all data is summarized in one table.
This didn’t worked at the end of the query but in between it seems to work properly.
Then I had the Idea that the map statement runs faster on less data. So I tried to used this fact by an additional aggregateWindow. This is to be handled carefully (at least this is what my experiments tell me) because if the aggregate window to reduce the data is set to too low time ranges it slows even further down the query run time.
The idea behind is (and this is only true for my case I think) that I can live with a certain uncertainty over huge time ranges. The uncertainty comes with the mean function in the first aggregate window.
The next performance increasement is (I hope at least) to reduce the map functions to one map function. In the beginning I had three which I now reduced to one.
Last thoughts: As the documentation of aggregateWindow tells it returns only the column on which the window was running. Therefore the additional drop statement on category, name etc. can possibly be skipped.
Assessment is: I cannot tell why the data within influxdb is as it is. I don’t know where the column with name “result” comes from (I don’t write it) or why it has sometimes null and sometimes _result as value set.
idea, not sure that it works for your situation:
why not creating a task in influx to calculate the results every (eg 15 minutes) and use this data in grafana to prevent that grafana has to do the full calculation,
I have more or less the same situation: data is stored in influx, delta time is not always the same and i want to have the aggregated data in 15 minues (every 15 minutes)
a task in influx calculates every 15 minutes the quarter-hourly demand and store this in another bucket.
this is what i do: pick all the stored records every 15 minutes, do the calculation, sum the result and store it in a new bucket. That one is used in grafana.
(In this case i grab the used power that was measured in a timeframe, multiply this with the elapsed timeframe in seconds, sum the result into one value, and divide this by the number of seconds in 15 minutes → this gives me the quarter-hourly demand)
For now this works, i have to check if the result is correct in all cases.
What is obvious now is, that there is no faster query giving the same result and that most of the time needed is consumed by the map, elapsed and aggregate window functions.
So reducing the amount of data that is calculated at once and spread this over time as the data points are written to influxdb should do the trick.
I was up to lock for kapacitor and then found out kapacitor is not able to handle influxdb 2.
Then I read your reply to my question.
My solution is now exactly this. Writing a query for this task that runs every 15 minutes (or even one day would also be possible I think). The task (with 15 minutes time range) looks like this:
I reduced the two map functions to one.
elapse with time unit of 1s is enough accuracy for me. I don’t need calibrated evaluations
Spikes in the power flow (to be honest) are already smoothed out by only 1000 pulses/kWh of the power meter…
Now I am writing a small java program based on an InfluxDB client written in java to handle the 2 years of data points that are already written into the data base.
If code quality is good enough I will post a github link after wards.