Resource consumption on high load test

HI,

Our group are checking several of tools for performance testing.
During the evaluation of k6s we notice a high (crazy ) resource consumption.

The scenario is quite simple but heavy

Run 23000 (constant) requests (yes, we need this number exactly) per second for a few hours to only one API.

When running the following simple test we start with. m5.8xlarge we reach 99% of node cpu consumption after 8 min so the RPS was only 2500. we double the machine to m5.16xlarge 64 cpu 256 GiB we got about ~4800 RPS and the node cpu was throttling as the k6 app consume 62 cpu! before it reach the pick, we can try on larger machine but I guess we will not get to our KPI. is there any limitation issue with our numbers for K6 ?
To verify we have create a new cluster with only k6s on it and monitoring the cluster show clearly that that k6 is taking all the resource.

Should we run the test differently ?

export let options = {
	insecureSkipTLSVerify: true,
	scenarios: {
		test1: {
			executor: 'constant-arrival-rate',
			exec: “testFunc”,
			rate: 23000,
			timeUnit: '1s',
			duration: ‘6000s',
			preAllocatedVUs: 23000,
			maxVUs: 23000,
			
		},
	},
	thresholds: {
		'http_req_duration': ['p(99)<2000’],
		'health is ok': ['p(99)<2500'],

	},
};


export function testFunc() {
	let r = http.get(`<ourAPI>`);
	check(r, {
		‘ok’: (res) => res.status === 200,
	});
}

Please let us know how to proceed. or should we try different type of test…

Something strange is going on here, k6 should be able to handle a lot more than what you’re struggling with… :confused: I’ve reached 2/3 of your desired RPS on my laptop, albeit while testing against localhost, but still…

Can you give us some more details? Specifically, which k6 version and OS are you using, and more importantly, are you using any external metrics outputs like InfluxDB?

We have a docs article for different tweaks to allow k6 to run large tests: Running large tests

In your case, I am not sure if --compatibility-mode=base would help a lot, but adding discardResponseBodies: true to the options is probably a very good idea.

we are running on the latest version of K6 0.30, with and the image is image: loadimpact/k6:latest .

yes we are using grafana and infulx , however the monitoring doesn’t show that high load not on grafana and not on infulx, i’ll try soon with the [ --compatibility-mode=base ] and discardResponseBodies: true and let you know.

Hi @Bred,
using outputs and especially InfluxDB is taking more memory, which even after some optimizations didn’t become a lot better.

Also usually InfluxDB can’t keep up with k6, can you run with -v and see how long it takes to write to InfluxDB? There will be InfluxDB: Batch written! with a t=<pretty printed duration> which ideally should be below 1 second as that is the interval we try to write to InfluxDB. After some of the changes we are now buffering and retrying writes (so you don’t loose data) but this still means that InfluxDB needs to write them faster then k6 can produce them, and if not k6 will just keep them in memory.

Additionally if you are not using the summary output and thresholds adding --no-summary --no-threshold will help. And if you are running docker you can just run with loadimpact/k6:master which reduces memory usage just as much as --compatibility-mode=base does, but doesn’t require you to rewrite to ES5.1 syntax.

As optimizations for InfluxDB I would recommend trying to add more of the system tags as tags as fields, this will probably require to drop the InfluxDB db as it will already have them as tags.

Maybe even trying to use telegraf in the middle to try and aggregate some of the metrics before they get written in InfluxDB, I haven’t tried that and can’t find a good full example currently, so if you do it, please share:)

Hope this helps and good luck

@mstoykov @ned thanks.

Well, we (my team) tried many things, even using es5 and base mode which doesn’t help much, discard the response body etc , still it require many cpu (> 60) and doesn’t help much.

The memory is about ~90 GB but this is not the issue

However changing the test from

export let options = {
	insecureSkipTLSVerify: true,
	scenarios: {
		test1: {
			executor: 'constant-arrival-rate',
			exec: “testFunc”,
			rate: 23000,
			timeUnit: '1s',
			duration: ‘6000s',
			preAllocatedVUs: 23000,
			maxVUs: 23000,
			
		},
	},
	thresholds: {
		'http_req_duration': ['p(99)<2000’],
		'health is ok': ['p(99)<2500'],

	},
};


export function testFunc() {
	let r = http.get(`<ourAPI>`);
	check(r, {
		‘ok’: (res) => res.status === 200,
	});
}

To this

var http = require('k6/http');
var k6 = require('k6');

export const options = {
    insecureSkipTLSVerify: true,
    stages: [
        {target: 20000, duration: '2m'},
        {target: 30000, duration: '5m'},
        {target: 0, duration: '1m'},
    ],
    thresholds: {
        requests: ['count < 100'],
    },
};
export default function () {
    // our HTTP request, note that we are saving the response to res, which can be accessed later
    const res = http.get(`<ourAPI>`);
    k6.sleep(0.01);
    var checkRes = k6.check(res, {
        'status was 200': function (r) {
            return r.status == 200;
        },
    })
}

Now we were able to reach 13000 RPS (which is much better but not as requested …) , with only 10-15 cpu on pick,

Few questions :slight_smile:

  1. Could you please explain why the first test is so cpu intensive?
  2. how the timeout work here, what it mean actually using
    k6.sleep(0.01)
    or k6.sleep(1) , this is the iteration per second ?

We need to scale about 23000 for constant RPS how should I modify the second test ?
update:
well, the mem is also an issue.

image
Thanks!

Sorry for the slow reply, I needed to run some experiments and we had some discussions about this.

Here is my comment in an issue with some findings, but the gist is that the arrival-rate still needs … work to get more performant.

I recommend (with some input from collegues) that you try (in this order):

  1. lowering the vus to 100-200 as we think you should not need 23k. After running with this for a while, k6 would’ve either initialized more (up to the maxVUs) or you have enough - you can set then your vus to the value (or above it) of the number of vus k6 actually needs. how low you can go depends a lot on how long the request takes.
  2. using ramping-arrival-rate should be “slightly” better, but at least is a fairly simple fix - the reason for this is mostly because it has more work put into it so it handles better some … corner cases especially with high rates.
  3. using http.batch you can make multiple requests so instead of 1 VU doing 1 request you can have it do 5 or 10 or 20. Please set batch and batchPerHost accordingly. This does mean that it will likely make fewer “connections” to the server though, which might not be what you want.
  4. constant-vus is by far the executor with the least amount of overhead. Using it + either rps (which in general has bad performance and is deprecated in favor of arrival-rate) or the js approach explained here. I haven’t tried how neither one of them stacks against arrival-rate so they might as well be worse:man_shrugging:

Hope this helps you and hopefully the optimizations will land in v0.32.0 :wink:

HI @mstoykov ,

Thanks for your elaborations!

To verify, should I run the following?

exports.options = {
    scenarios: {
        test1: {
            executor: 'ramping-arrival-rate',
            preAllocatedVUs: 100,
            timeUnit: '1s',
            stages:[
                {target:100000, duration:'0'},
                {target:100000, duration:'6000s'},
            ],
           exec: `myFunc`,
        },
    },
};

exports.default = function () {}

Another question, in case for example I want to run 10000 RPS
But share connection between 1000 VU running 10000 requests, is it possible ?
e.g. Like 10 users running 1000 requests per seconds

Have a good day!

Hi @Bred ,
You should have the target at 23000 as you wanted and keep maxVUs to something high so k6 will try to initialize VUs if there aren’t enough.

But share connection between 1000 VU running 10000 requests, is it possible?
e.g. Like 10 users running 1000 requests per seconds

VUs can’t share connections between them but will keep their own connections and reuse them for requests. So for example using http.batch as mentioned above will likely use the same connection for multiple parallel requests, but also between each iteration the connections are reused per each VU