Yeah, I understand that @bbarin , but I tried locally, and I can’t make it leak - it likely is dependent on your usage.
Can you share any snippet of code that represented your usage? A full-blown script that leaks and uses https://httpbin.test.k6.io will be best but even “this is the code we had with some name changes and moving to this resolved the leak” will also be very useful.
The code in jslib is actually pulled from a fairly old version of core-js but I can not see any suggestion that older version were leaky :(.
Are you experiencing the memory growing with this script and --no-thresholds --no-summary?
Do you use any outputs?
Also, I guess you run this through something as the optional chaining (?.) currently doesn’t work very well in k6. Can you provide the final script? Maybe in a gist if it’s huge.
@mstoykov, yep we are experiencing the memory leak even with --no-thresholds --no-summary.
We have also tested without the check and the results are the same.
My k6 gets to around 5.3GB (after a minute or two to stabilize) and then over 1 hour inches very slowly to 6GB. And this is actually the highest it has gone on one run the rest were closer to 5.8GB
There might be a leak, but I mostly expect it’s due to it being CPU bound and the GC just not being able to keep up.
Your graphs looked a lot steeper.
I’ve also did some proffiling really quickly and … nothing. In between heap dumps things go up and down, but ultimately this is normal. Sometimes there are more http2 machinery stuff still in memory, sometimes there are more k6, sometimes more js. But ultimately the memory seem to get reused. Actually in between most of my heap dumps it seemed like the amount of stuff in memory went down after the first few minutes .
Ultimately I think in my case the growing memory is due to 100% CPU usage and GC just not keeping up every once in a while.
@mstoykov did you run with 8640 VUs? My machine is not able to run it, we run that on a server with 16 cores and 64 GB of RAM. Small numbers of VUs don’t reveal in the memory leak - perhaps the object allocation is so high that the GC cannot keep up. I wonder how in the docs says a single machine handles ~30k VUs (of course the resources are bigger) but somehow I don’t see us reaching not even close to these numbers.
From my local testing at least with this script on my laptop the json output has problems writing the huge amount of metrics the script is generating. Which basically means that they keep piling in the process while the json output still doesn’t output them fast enough.
What is your RPS in your real case? And does removing -o json there help?
The real scenario is around 1M rpm. Yes, removing the output helps to keep the memory under control, but having no metrics is not an option for now as we have no other ways to check the metrics (error rate, throughput, etc).
The JSON or CSV output aren’t really … great to begin with and at 17KRPS I see no way in which they will be able to keep up without some kind of aggregation.
I would recommend either:
fork the json output and try to make it work for you. You can probably only keep http_req_duration and skip all other metrics which I would think will fix the issue for you. You can also just write to some other format - JSON isn’t known for it’s good decode/encode speeds
you can probably use the cloud output and build something that receive metrics from it. It at least support HTTP request metric compression which is specific for it and you will need to work with it. IT also is likely that we will change the format at some point so you might want to keep that in mind.
Given the above - the k6 cloud will also handle this kind of load ;).
Spreading this on multiple machines is also possible through direct usage of execution segments. But then you will need to merge the jsons at the end so not really certain this will work great. k6-operator can also be looked at, but will also need to merge the output at the end.
Try a different output … I would expect a telegraf+statsd combo like the one I have some configs here might be able to handle 17k RPS
We are currently using the output to kafka, the result is quite similar when compared to JSON. I believe the problem is related to the unbounded nature of the channel which probably receives too much data and as the CPU is under pressure, the amount of data starts to piles up to be sent to the output.
We took the approach to scale horizontally and reduce the VUs and the throughput of individual pods.
Thank you very much for your support! It was very much appreciated!
@bbarin I have made a PR with some optimizations for the JSON output. It likely still won’t manage this rate, but you can try it by building it from source.
If you do please write back how better it actually behave for your real case scenario
I’m testing a k6 script with 2600 ccu running in 3600s, 15gb Ram but I run it for about 20 minutes and it runs out of ram.
I used discardResponseBodies, my stream has about 30 api including using metrix Gauge, Counter, Trend, Rate to custom report.
Is there any way to reduce the amount of CPU?