Better to do calculations in the query or in the visualization script (via JavaScript)?

asdf06 · January 5, 2024, 10:01am

Hi,
I have some time series (energy data in seconds) stored in an InfluxDB. I made the experience that the standard graphs in Grafana like time series, x-y-plot etc. reach their limits when you start calculating and customizing.
So I wanted to have some custom plots and tried Plotly and Apache E-Charts Plug-In. In works for now, even coding in JavaScript (ChatGPT ftw ). But the calculations and transformations of my raw data from InfluxDB I could either do in the query (or using Grafana transformations) or in the JavaScript section of the Plug-in. The latter seems faster and you get more help because JavaScript is more common than Influx Query Language.
As for the current status, I have some calculations here and some there, which doesn’t seem very consistent. I do calculations like grouping by weekday and certain time periods per weekday and then showing average values or sorting the power values and giving them time shares in %. In the future I would like to work more with graphs where the user can choose variables himself like the time period for average values.

So my questions are:
1. Is it better to do the calculations in the query or in the visualization script e.g. in JavaScript and why either one or the other way is preferable?
2. If the script way is preferable, is there a way to write and test the code somewhere else with a real debugging option?

Thanks in advance!

jangaraj · January 5, 2024, 12:05pm

In the query, because TSDB (InfluxDB) is designated for this kind of computations + it is running on the server, where you have more resources usually.

Let’s imagine dummy user selects long time range, which returns huge dataset - e. g. 100M rows. What will happen if you load this huge dataset into your browser (to be processed in javascript)? I guess your browser will be unresponsive.

Design your solution for edge cases, e. g. huge datasets should be processed.

asdf06 · January 5, 2024, 2:32pm

Thanks for the food for thought… I wouldn’t have thought of that myself and it shows, that I don’t really understand the structure and how, for example, the Plug-in with JavaScript works.

For now, the operations with querys take longer than the ones in JavaScript. Does that mean, that I don’t have enough capacity on my InfluxDB Server or could there be any other reason?

Is there another way to operate on the InfluxDB data in Grafana besides using the user interface in Grafana with the Influx Query Language?

jangaraj · January 5, 2024, 3:37pm

That is an assumption without any facts.

Only you know your data, query, DB sizing, so only you should be able to find a reason.
It can be anything, e.g. small dataset - overhead to compute 2 records is probably bigger on the InfluxDB side, than Javascript. It can be inefficient query, e.g. filtering on fields, not on tags, high cardinality tags, …

yosiasz · January 5, 2024, 4:11pm

are you saying you record data every second? do you really need that level of granularity?

Please post your schema?

asdf06 · January 8, 2024, 8:52am

This is my InfluxDB schema:


bucket	company_xy
measurement	meter1
tag keys	maybe one or two tags, e.g. like measuring_campaign or machine_name
field keys	egy_irms egy_irms_a egy_irms_b egy_irms_c egy_kwh egy_kwh_a egy_kwh_b egy_kwh_c egy_pf egy_pf_a egy_pf_b egy_pf_c egy_vrms egy_vrms_a egy_vrms_b egy_vrms_c egy_watt egy_watt_a egy_watt_b egy_watt_c

These can be continuous or temporary measurements for a specific period of time.Yes, the granularity in seconds is necessary. Even if I don’t need them for every analysis I often need it to see the behavior of a machine which you can’t see in aggregated values.
Mostly, for the visualisations in Grafana I filter the values by the field “egy_watt” (power):

|> filter(fn: (r) => r[“_field”] == “egy_watt”

I only need the other values, for example, to make sure that the meter is working properly and to check the plausibility of the values.

yosiasz · January 8, 2024, 3:52pm

Your schema design approach is problematic and does not scale

You are filtering on a field, fields are not indexed tags are.

The other schema design approach issue is you have fields with identifiers _a _b instead of making those rows with an identifier tag a, b, c

asdf06 · January 10, 2024, 2:17pm

Thanks for your assessment! I didn’t build the database myself but I think I can comprehend the thoughts of the builder. All numeric values are stored in the fields and the tags store additional information. The field keys are the default names from the device sent via MQTT.

I understand your comments on the scheme and the best practice tips but it’s really hard for me to transfer it to my data. Could you please help me with that?

And why is that?

yosiasz · January 10, 2024, 2:44pm

Did you read the linked docu

asdf06 · January 10, 2024, 4:29pm

Yes, I did. After I also watched this video: [Training] InfluxDB Performance Tuning and Schema Design , I will try again… The video is already from 2016, but I assume that the principles are still valid.

So, I could structure my data like that (with egy = irms/kwh/watt, … and phase =a,b,c,total)

Option 1: where egy and phase are tags and value is a field
meter1,egy=watt,phase=a value = 100 <timestamp>
Option 2: where only egy is a tag
meter1,egy=watt a=100,b=100,c=100,total=300 <timestamp>

Am I on the right way? Which option to prefer?

yosiasz · January 10, 2024, 5:01pm

I would go more explicitnso that even grandma can parse things

Avoid a,b,c unless those are industry standard

weather_sensor,crop=blueberries,plot=1,region=north temp=50.1 1472515200000000000
weather_sensor,crop=blueberries,plot=2,region=midwest temp=49.8 1472515200000000000

asdf06 · January 11, 2024, 7:41am

Sure, I have to think about the names a little longer (nevertheless it’s phase a,b,c or 1,2,3). If I’m likely to use phase=total most often, i.e. regularly filter by total, option 1 is preferable, right?

grant2 · January 11, 2024, 11:07am

Not trying to get too deep into the weeds, but in your current schema, can you explain (as if it were your grandma) what these three fields mean?

Also, I’d say in general, use Flux to arrive at the total (if you are summing 3 phase values).

asdf06 · January 11, 2024, 12:44pm

Hi grandma These are the original names sent by the electrical measuring device. The addition in the designation stands for the measured variable, like watt for the power or vrms for voltage root mean square. a, b, c stands for one of the three current phases, No suffix means the “total”, e.g. the sum or the average of the values (depending on the measured parameter).

yosiasz · January 11, 2024, 12:50pm

So, that approach is tribal knowledge, you had to explain it implicit and does not have grandma stamp of approval . Good schema design explains itself explicitly

asdf06 · January 15, 2024, 9:07am

I totally agree. As I said, those are just the names sent by the device and I will try to improve naming and schema design in future.

This is my current draft:


bucket	company_xy
measurement	meter1
tag keys	measurand (current_rms, energy, power_factor, voltage_rms, power) unit (ampere, kwh, none(?),volt, watt) aggregation (average, sum)
field keys	total (phase_a) (phase_b) (phase_c)

What I’m still struggeling about is the question if the values for total and phase a,b,c go in the same measurement or are different measurements itself. I found arguments for both ways… Right now I feel better with putting them as own measurements. I could still filter for the tag aggregation then when I want to use the aggregated value (which will be mostly the case).

Topic		Replies	Views
Speed up long running Flux Query InfluxDB influxdb , query-help	16	5090	December 11, 2022
Problem in Querying the Data from influxdb InfluxDB	8	3818	January 16, 2019
Worldmap with influx db Dashboards plugins	37	2009	October 27, 2022
Unable to graph data from InfluxDB bucket InfluxDB snmp , telegraf	17	211	July 9, 2024
10-minute difference between timestamps in InfluxDB and Grafana InfluxDB influxdb	11	3446	August 31, 2022

Better to do calculations in the query or in the visualization script (via JavaScript)?

Related topics