Out of Memory with More Virtual users

amruth.chintha · December 11, 2019, 6:17pm

Hello Team,
I am using a simple script to post a transaction with 1000 VU’s. I am using a data store with 250k records.(single column and 250k rows )
With 100 vu’s it worked. but with 1000 VU’s - I am getting Out of memory error.
What do I do ?
Please let me know if you need any additional info.
We were running a test with a datastore that had 1.7 million records in Load Impact(Version 3.0) with out any issues.

Thanks,
Amruth.

nedyalko · December 12, 2019, 8:19am

Unfortunately, at the moment, when you open a file in k6, each VU has its own individual copy of that file in memory. There’s also no way at the moment to not read a whole file in memory at once, but read it line by line, in a streaming manner. Both of these things are high in our roadmap: Shared state/memory between VUs · Issue #532 · grafana/k6 · GitHub, Proposal for a K6 stream API · Issue #592 · grafana/k6 · GitHub, CSV API · Issue #1021 · grafana/k6 · GitHub

What type of a data store are you using? I’m asking because we’ve noticed that CSV parsing with some popular JS libraries like papaparse takes up a surprisingly large amount of RAM, so if that’s the case with you, directly loading JSON files or plain text files might be a partial short-term workaround.

There are other tricks you can use to reduce k6 memory usage (like discardResponseBodies and the upcoming --compatibility-mode=base option), but these won’t fully make up for a huge static file loaded in each VU. Unfortunately, until we fix the underlying issues, we’re unlikely to support millions of datastore records with lots of VUs on the same machine So until then, you’d need either a bigger machine and/or smaller datastore files and/or less VUs per machine…

amruth.chintha · December 12, 2019, 3:40pm

Thanks Nedyalko. I am using a CSV file. I will try using a JSON file. Even when you use a JSON file, each VU will have its own copy of that file in the memory right ? correct me if I am wrong. Could you please tell if you have any estimate by when will we have this fixed ?
Mean while I will try to work with your tricks.

amruth.chintha · December 12, 2019, 7:17pm

I was able to run a test with 250 VU’s using a JSON file as of now - which is taking 60GB of my memory. Eventually we will need to run a test with 7500 VU’s.

nedyalko · December 13, 2019, 7:40am

Yes, that’s unfortunately true, for now.

In the next few months. The current priority is finally getting k6 v0.26.0 released (next Monday) and then finishing #1007 (hopefully early January). One of us will probably start working on the shared and streaming read-only memory (i.e. data stores) immediately after that. It’s probably going to take at least a few weeks, since as I pointed out in the CSV issue, there are some complexities involved and we need to design the APIs to be composable.

PlayStay · May 19, 2022, 7:01pm

Hello @ned sorry to bring up a oldie, but I didn’t want to create an duplicate topic. Can you or someone in the community shed light on any enhancements in this area - CSV API · Issue #1021 · grafana/k6 · GitHub namely seed data partitioning. Maybe defining a method to only use enough unique data to complete a test by duration. is there now a way to only use partial segments of a data file rather than reading the entire copy into memory?

thanks.

SrPerf · May 19, 2022, 11:24pm

Hello @amruth.chintha ,
Saw your post and had to contribute a bit.
As it was mentioned, each VU will get a copy of your file and as you multiply the users, the memory requirements grows accordingly.
A solution I have used in the past was to segment my data file.
May seem rudimentary, but I would split it per vu, ending with dataFile001.csv to dataFile999.csv and load it with the vu ID. With that you would have only 250 rows per file and most probably no memory problems.
I know splitting the file may seem tedious, but a script could easily help with that.
Then I may not use the shared array…just
return papaparse.parse(open('./dataFile'+${users[vu.idInTest]+'.csv'), { header: true }).data;

Some ideas, I hope that helps, but I am working on some posts to explain more ideas on how to tackle that issue.

Hope it works.
Gracias,
Leandro

mstoykov · May 20, 2022, 7:30am

Hi @PlayStay, I would recommend trying SharedArray.

That issue hasn’t been had any updates and due to the existence of SharedArray likely will not get implemented and very likely not in the way it was discussed 2+ years ago.

PlayStay · May 20, 2022, 6:11pm

@mstoykov would you ‘not’ recommend partitioning JSON formatted datafile given SharedArray does most of this heavy lifting?

I am using SharedArray but I have had problems with EC2 instance sizing when datafiles exceed 60M. After a short period of time with smallish EC2 instances my jenkins controller would lost connectivity to the AWS instance. after investigation the ec2 instance simply crashed due to memory exhaustion. I"m trying to find the sweet spot between datafile size, executor options, ec2 instance load generator size and k6 script utilization across hundreds of service api profiles. In short I’m trying to get a close to 1 size fits all as possible.

for example - using an m5.8xl might make sense for a test suite with “heavy” POST calls and high rate GET calls approaching 12K TPS (sourced from a 3million record datafile), but using the same instance with the same datafile for a 200 TPS test suite is overkill. I want to limit how many ec2 LGs to solve different load/memory solutons.

I was hoping a nice tidy solution was in the k6 sdk :).

mstoykov · May 21, 2022, 9:04am

would you ‘not’ recommend partitioning JSON formatted datafile given SharedArray does most of this heavy lifting?

I would argue it isn’t needed. SharedArray will make it so there is only one copy of the whole data. So it should be the same as splitting it in one piece per VU. splitting the data in 2 parts to be shared by 50% of the VUs each and using SharedArray on top of that should have the same memory characteristics. If that makes the logic easier - you need to split the data in N pieces and N scenarios to work on it - go for it, but otherwise it shouldn’t matter.

exceed 60M.

What is M here? A Million data points? How big is that?

after investigation the ec2 instance simply crashed due to memory exhaustion

Are you certain that all the opening and processing of the data happens inside the SharedArray?

POST calls

Uploading data with k6 is currently not very optimized as has been discussed in this issue. You might need to try different ways of building the body and maybe caching it which might make the upload … more performant.

I was hoping a nice tidy solution was in the k6 sdk :).

Given that this is dependent on how the script will behave during execution - there are no magic bullets, sorry.

I would also recommend opening new topics when you have questions instead of “resurrecting” 2 years old ones

BharathM · May 21, 2022, 9:56am

Hello @PlayStay

I would suggest creating a web server using https://gin-gonic.com/ or any other framework and exposing an endpoint http://localhost:8899/getdata.
You can call this endpoint in your script to get the data into a virtual user.
The program should read the CSV into an array and keep sending new data once a request is made to the endpoint.
Gin can easily handle 5K requests/sec and around 20 lines of Go code for creating the webserver.

Regards,
Bharath.

PlayStay · May 24, 2022, 3:35am

mstoykov:

would you ‘not’ recommend partitioning JSON formatted datafile given SharedArray does most of this heavy lifting?

I would argue it isn’t needed. SharedArray will make it so there is only one copy of the whole data. So it should be the same as splitting it in one piece per VU. splitting the data in 2 parts to be shared by 50% of the VUs each and using SharedArray on top of that should have the same memory characteristics. If that makes the logic easier - you need to split the data in N pieces and N scenarios to work on it - go for it, but otherwise it shouldn’t matter. I Implemented a multiple file solution and the runtime load and memory footprint on my load generation agent were much improved. Sorry I can not share those details given the nature of our environment

exceed 60M.

What is M here? A Million data points? How big is that? 60 megabyte data file. 1.5million records. I played around with your benchmark script with my data and found a sweet spot - SharedArray "benchmark" · GitHub

after investigation the ec2 instance simply crashed due to memory exhaustion

Are you certain that all the opening and processing of the data happens inside the SharedArray? YES

POST calls

Uploading data with k6 is currently not very optimized as has been discussed in this issue. You might need to try different ways of building the body and maybe caching it which might make the upload … more performant.

I was hoping a nice tidy solution was in the k6 sdk :).

Given that this is dependent on how the script will behave during execution - there are no magic bullets, sorry.

I would also recommend opening new topics when you have questions instead of “resurrecting” 2 years old ones If this is the etiquette, then consider it done - thanks. Resurrection was not my intent. Not creating a nearly duplicate topic was my intent. Maybe these should age out of the search engine somehow?

Please see responses inline.

PlayStay · May 24, 2022, 3:40am

Thanks Bharath, tip much appreciated. I had considered this but our our security team would squash this approach in a heartbeat

giadat · June 19, 2023, 2:18am

I’m testing a k6 script with 2600 ccu running in 3600s, 15gb Ram but I run it for about 20 minutes and it runs out of ram.
I used discardResponseBodies, my stream has about 30 api including using metrix Gauge, Counter, Trend, Rate to custom report.
Is there any way to reduce the amount of CPU?

executor: ‘ramping-vus’,
startTime: ‘0s’,
startVUs: 0,
stages: [
{ duration: 1200 , target: 2600 },
{ duration: 2400 , target: 2600 },
],
gracefulRampDown: ‘0s’,
exec: ‘transferNapasAccount’

eyeveebee · June 19, 2023, 10:54am

Hi @giadat

Please do not post on old threads (this one and K6 is memory hungry when using several modules - #41 by giadat), especially once you have created a new topic in Out of memory K6. Can you follow up on the new topic?

Cheers!

Topic		Replies	Views
Share of date file by all Vu's OSS Support	4	3383	February 14, 2020
K6 cannot execute js scripts if the csv file is too large Grafana k6	26	1719	July 22, 2022
K6 v0.30.0 is out! 🎉 Announcements	1	1514	January 25, 2021
Grafana K6 - New user - Memory issues reading from JSON and/or CSV when file size is large Grafana k6 k6	4	169	January 30, 2025
How to load JSON from a file per VU iteration OSS Support	3	1344	January 20, 2022

Out of Memory with More Virtual users

Related topics