Better to do calculations in the query or in the visualization script (via JavaScript)?

Hi,
I have some time series (energy data in seconds) stored in an InfluxDB. I made the experience that the standard graphs in Grafana like time series, x-y-plot etc. reach their limits when you start calculating and customizing.
So I wanted to have some custom plots and tried Plotly and Apache E-Charts Plug-In. In works for now, even coding in JavaScript (ChatGPT ftw :wink: ). But the calculations and transformations of my raw data from InfluxDB I could either do in the query (or using Grafana transformations) or in the JavaScript section of the Plug-in. The latter seems faster and you get more help because JavaScript is more common than Influx Query Language.
As for the current status, I have some calculations here and some there, which doesn’t seem very consistent. I do calculations like grouping by weekday and certain time periods per weekday and then showing average values or sorting the power values and giving them time shares in %. In the future I would like to work more with graphs where the user can choose variables himself like the time period for average values.

So my questions are:
1. Is it better to do the calculations in the query or in the visualization script e.g. in JavaScript and why either one or the other way is preferable?
2. If the script way is preferable, is there a way to write and test the code somewhere else with a real debugging option?

Thanks in advance!

In the query, because TSDB (InfluxDB) is designated for this kind of computations + it is running on the server, where you have more resources usually.

Let’s imagine dummy user selects long time range, which returns huge dataset - e. g. 100M rows. What will happen if you load this huge dataset into your browser (to be processed in javascript)? I guess your browser will be unresponsive.

Design your solution for edge cases, e. g. huge datasets should be processed.

1 Like

Thanks for the food for thought… I wouldn’t have thought of that myself and it shows, that I don’t really understand the structure and how, for example, the Plug-in with JavaScript works.

For now, the operations with querys take longer than the ones in JavaScript. Does that mean, that I don’t have enough capacity on my InfluxDB Server or could there be any other reason?

Is there another way to operate on the InfluxDB data in Grafana besides using the user interface in Grafana with the Influx Query Language?

That is an assumption without any facts.

Only you know your data, query, DB sizing, so only you should be able to find a reason.
It can be anything, e.g. small dataset - overhead to compute 2 records is probably bigger on the InfluxDB side, than Javascript. It can be inefficient query, e.g. filtering on fields, not on tags, high cardinality tags, …

are you saying you record data every second? do you really need that level of granularity?

Please post your schema?

This is my InfluxDB schema:

bucket company_xy
measurement meter1
tag keys maybe one or two tags, e.g. like
  • measuring_campaign
  • or
  • machine_name
field keys
  • egy_irms
  • egy_irms_a
  • egy_irms_b
  • egy_irms_c
  • egy_kwh
  • egy_kwh_a
  • egy_kwh_b
  • egy_kwh_c
  • egy_pf
  • egy_pf_a
  • egy_pf_b
  • egy_pf_c
  • egy_vrms
  • egy_vrms_a
  • egy_vrms_b
  • egy_vrms_c
  • egy_watt
  • egy_watt_a
  • egy_watt_b
  • egy_watt_c

These can be continuous or temporary measurements for a specific period of time.Yes, the granularity in seconds is necessary. Even if I don’t need them for every analysis I often need it to see the behavior of a machine which you can’t see in aggregated values.
Mostly, for the visualisations in Grafana I filter the values by the field “egy_watt” (power):

|> filter(fn: (r) => r[“_field”] == “egy_watt”

I only need the other values, for example, to make sure that the meter is working properly and to check the plausibility of the values.

Your schema design approach is problematic and does not scale

You are filtering on a field, fields are not indexed tags are.

The other schema design approach issue is you have fields with identifiers _a _b instead of making those rows with an identifier tag a, b, c

1 Like

Thanks for your assessment! I didn’t build the database myself but I think I can comprehend the thoughts of the builder. All numeric values are stored in the fields and the tags store additional information. The field keys are the default names from the device sent via MQTT.

I understand your comments on the scheme and the best practice tips but it’s really hard for me to transfer it to my data. Could you please help me with that?

And why is that?

Did you read the linked docu

Yes, I did. After I also watched this video: [Training] InfluxDB Performance Tuning and Schema Design , I will try again… The video is already from 2016, but I assume that the principles are still valid.

So, I could structure my data like that (with egy = irms/kwh/watt, … and phase =a,b,c,total)

  • Option 1: where egy and phase are tags and value is a field
    meter1,egy=watt,phase=a value = 100 <timestamp>

  • Option 2: where only egy is a tag
    meter1,egy=watt a=100,b=100,c=100,total=300 <timestamp>

Am I on the right way? Which option to prefer?

I would go more explicitnso that even grandma can parse things

Avoid a,b,c unless those are industry standard

weather_sensor,crop=blueberries,plot=1,region=north temp=50.1 1472515200000000000
weather_sensor,crop=blueberries,plot=2,region=midwest temp=49.8 1472515200000000000

Sure, I have to think about the names a little longer (nevertheless it’s phase a,b,c or 1,2,3). If I’m likely to use phase=total most often, i.e. regularly filter by total, option 1 is preferable, right?

Not trying to get too deep into the weeds, but in your current schema, can you explain (as if it were your grandma) what these three fields mean?

Also, I’d say in general, use Flux to arrive at the total (if you are summing 3 phase values).

2 Likes

Hi grandma :slight_smile: These are the original names sent by the electrical measuring device. The addition in the designation stands for the measured variable, like watt for the power or vrms for voltage root mean square. a, b, c stands for one of the three current phases, No suffix means the “total”, e.g. the sum or the average of the values (depending on the measured parameter).

So, that approach is tribal knowledge, you had to explain it implicit and does not have grandma stamp of approval :older_woman: . Good schema design explains itself explicitly

1 Like

I totally agree. As I said, those are just the names sent by the device and I will try to improve naming and schema design in future.

This is my current draft:

bucket company_xy
measurement meter1
tag keys
  • measurand (current_rms, energy, power_factor, voltage_rms, power)
  • unit (ampere, kwh, none(?),volt, watt)
  • aggregation (average, sum)
field keys
  • total
  • (phase_a)
  • (phase_b)
  • (phase_c)

What I’m still struggeling about is the question if the values for total and phase a,b,c go in the same measurement or are different measurements itself. I found arguments for both ways… Right now I feel better with putting them as own measurements. I could still filter for the tag aggregation then when I want to use the aggregated value (which will be mostly the case).

1 Like