Aggregation by time messes up data, need to rethink about a whole new solution

Hi Everyone,

Colleagues and myself encountered a problem with one of our analytics set.

We are open for any suggestion, even if we need to change the way we collect the data and the DB we are using.

The case:

Our users creates hundreds of ec2 instances per day, in many regions, zones, types, etc…

We have an analytics task, written in Python that collects the amount of all ec2 instances, every 15 minutes, per many types/attributes.

Collected record example:

User | Instance type | our App version | zone | region | num of instances
Dave | t2.Smaill | v/3.2 | a | us-east1| 8

We store these records in Influx DB, which, of course declares a time stamp for every record.

Grafana is querying the data from influx db and aggregates the values (amount of instances) by attributes, such as ‘User’, and by time interval (which uses the time stamp that created by influxDB).
Query example:
SELECT sum(“count”) FROM "system_usage_table” WHERE $timeFilter GROUP BY time($interval), “zone” fill(null)

The problem:

The default aggregation by time can show wrong data when, for example, the time interval is 30 minutes, 2 records such as the above (presented in influxDB with different timestamp), will resulted in the graph as user Dave have '16 machines’ within this time range, which is not true, he never raised the amount over 8.

More than that, sometimes the records arrive in a time that is not fixed 15 minutes, so even if I aggregate with usual 15 minutes (as the scheduled task run) I get wrong data.

My naive suggestion:

Moving to SQL DB (PostgresSQL or MySql) and add new field per each record- ‘iteration’ which represents the analytics task ‘run number’. Then in Grafana, aggregating by iteration and then by any other added attribute to the query and present (not aggregate) the data on a time line.

Questions:

Do you think I can find a solution for this in our influxDB and Grafana?

If not, should my suggestion work?

Anybody else encounter the same problem?

Thanks for your help,

Oded

It looks like a wrong aggregation in your InfluxDB query, but nobody will be able to help because you didn’t provide used query.

Thanks
added an example, I hope it will enlighten the problem

My guess is that you want see average (MEAN()) value per period, not a sum (SUM())

SELECT MEAN(“count”) 
FROM "system_usage_table” 
WHERE $timeFilter 
GROUP BY time($interval), “zone” fill(null)

See doc: https://docs.influxdata.com/influxdb/v1.7/query_language/functions/