Aggregate data ingested every second into 1 minute

Hi everyone,

I have set up MariaDB database from which I transport data to Elasticsearch via Logstash jdbc plugin. The ingestion of the data from database processlist table is being done every second. So every second I got count of users name into Elasticsearch like:

user1
user1
user1
user2
user3
user3
etc…

There could be a lot of users that are connected from different places and I need to count them. From the example above we can see that there is occurrence of user1 3 times, user2 1 time and user 3 2 times: (user1:3, user2:1, user3:2). As the data are recorded every second there is not much difference between the value of occurence but if I query it in Grafana the numbers add up every second so after 2-3 second there could be something like (user1:9, user2:3, user3:6) but the exact numbers are still (user1:3, user2:1, user3:2).

Is there any option to aggreage the data over a minute. I mean in minute interval see the value of 1 second record? I could ingest the data every minute but I need to do it every second for another dashboard with the same datasource.

I know it’s not as clear as it could be so if there is any problem with understanding I could clarify!

Thanks for any piece of advice.

Try this:

Thanks for a suggestion. I’ve tried it already but it works within 15 minuties time range otherwise it throws “Unknown elasticsearch error” or “too many buckets”. I have there a lot of users active at the time so I need some query which shows me only the data from one second interval even if I set time range for like 6 or 24 hours…

You must increase the max_bucket in elasticsearch
like below…

PUT _cluster/settings
{
  "transient": {
    "search.max_buckets": 200000
  }
}

This command can do at Dev Tools in Kibana

Thanks! I’ve actually heard of it and read something like that this solution is not recommended by Elastic or so. Do you know what kind of impact on cluster and entire Elastic Stack?

Thank you for reply

You can add more computing and memory for the cluster. Calculation per 1s is very expensive especially for dashboarding, do this if it necessary as possible.
The configuration if you calculated per 1s, create a lots of buckets to the elasticsearch, and it will hurt the performance, the time interval is used to create the calculation bucket.
I suggest, just use the 10s interval for time bucket, and the data will never missing, just the calculation for sum, average, max, min, etc.

Thanks, I’ll give it a try.