Metrics aggregation in large test seems to cause excessive memory consumption

ian.frazer · October 25, 2022, 3:53pm

Hi there,

I have a test in K6 that is consuming excessive and constantly increasing memory unless I specify --no-thresholds and --no-summary. The test only uses 50 VUs but memory ramps up linearly over the course of a couple of hours to over 25GB, at which point I encounter oom problems.

I’ve been through the “Running large tests” guide and looked at, and tried all the suggestions - the only thing having a significant impact on memory consumption for my test is adding --no-thresholds and --no-summary.

I believe this is because of the nature of the URLs used in the test - I have approximately 30 unique URL “patterns”, however, each of these has a dynamic substitution for a dynamically generated number with 10 million possibilities.

I’ve tried using URL grouping - both explicitly with the “name” tag within the request, and with urlurl. I can see the grouping working, but it doesn’t affect the memory growth.

(Side note - I don’t think the “name” tag works properly for http.del() requests - I can only get it to work if I set a global name tag within options)

Setting --no-thresholds and --no-summary, I see no memory growth. Setting the output to cloud and looking at the performance insights, I don’t get warnings about the number of URLs or metrics, showing the grouping is working as expected.

However, I need local summary to work as I need to run tests for several hours, and everything else is working locally except the metrics reporting.

I attempted to output to a local influxdb but the volume / cardinality of metrics quickly blow memory and CPU on that too. Looking at the raw metrics by outputting to json, I see that although the name tag is correctly set, the unique url is still included with every metric - I believe this is the problem.

I tried to manually overwrite the url tag globally to a dummy value to get around this but it doesn’t seem to be possible. Is there any way to get around this, or anything else I should look at to potentially help the situation?

codebien · October 26, 2022, 10:19am

Hi @ian.frazer,
welcome to the community forum.

Thanks for the time spent writing details about your problem. If the solution using the URL grouping (name) works for you, you could just disable the url from the systemTags option. It could be good to disable all the non-required tags from that list.

(Side note - I don’t think the “name” tag works properly for http.del() requests - I can only get it to work if I set a global name tag within options)

Thanks for reporting, going to open an issue to fix it.

ian.frazer · October 26, 2022, 11:54am

Thanks for the reply, and highlighting the systemTags option I was unaware of. Unfortunately even after removing the majority of the system tags, and attempting some tuning of influxdb, it seems there’s still too much data for influxdb to cope.

Reducing the tags also doesn’t appear to have made a significant difference to k6’s own memory consumption when attempting to run with thresholds and summary active. I presume at this point I’m waiting on the implementation of Use HDR histograms for calculating percentiles in thresholds and summary stats · Issue #763 · grafana/k6 · GitHub to be able to have these options enabled for longer-running tests?

codebien · October 26, 2022, 3:01pm

Regarding this part I have to correct myself, I think you are setting the wrong parameter. As documented, the Param option for http.get is the second argument but for methods that accept a body it is the third.

it seems there’s still too much data for influxdb to cope.

In this case, I think you have to use telegraf for aggregating more.

Yeah, unfortunately in this specific case if you need the local run + summary and threshold then there aren’t easier solutions.

PlayStay · October 26, 2022, 9:46pm

@codebien are you saying that when unique URI are utilized in the face of a custom tag names that k6 still exports unique URI/

here is a case where a unique URI is produced due to accountIds but I’m aggregating them with the tags param. is there a bug in this operation based on what Ian reported?

	var uri = `/api/${accountId}`;
	var url = urlbase + uri;
	const params = {
		headers: {
			'Authorization': 'Bearer ' + api_token,
			'X-PSN-QA-Data-Marker': 'ltip',
			'Content-Type': 'application/json',
		},
		tags: {
			name: 'GET /api/accountIds`'
		},
		timeout: request_timeout,

codebien · October 27, 2022, 8:10am

Hi @PlayStay,
no, I meant that I expect @ian.frazer is doing:

http.del("https://httpbin.test.k6.io/anything", params)

instead, the correct call is:

http.del("https://httpbin.test.k6.io/anything", null, params)

because del method’s signature expects a body as a second argument del(url, [body], [params]).

ian.frazer · October 27, 2022, 9:19am

Thanks - You’re correct I was indeed calling http.del() incorrectly. With the args set up properly it works as expected.

Thanks for the pointer to use telegraf - will give that a try.

Topic		Replies	Views
Some Tags Don't Appear in Datadog Grafana k6	2	865	May 27, 2022
K6 http.del not working as expected OSS Support	3	713	November 16, 2022
Removing or reducing metrics in k6 Grafana k6	3	190	July 18, 2024
Getting Warn : Consider not using high-cardinality values like unique IDs as metric tags OSS Support	2	1169	August 31, 2024
K6 to influxdb with lesser http_req_duration data OSS Support	12	2764	April 12, 2021

Metrics aggregation in large test seems to cause excessive memory consumption

Related topics