Xk6-dashboard html report show more qps than it's supposed to be

nattawitchaiworawit · December 18, 2024, 5:32am

I’m trying to run k6 with ramping-arrival-rate scenario, but mostly it should be constant at 15,000 requests per second.

	scenarios: {
		ramping: {
			executor: "ramping-arrival-rate",

			// Pre-allocate necessary VUs.
			preAllocatedVUs: 50,
			maxVUs: 2000,

			stages: [
				{ duration: "1m", target: 15000 },
				{ duration: "10m", target: 15000 },
				{ duration: "1m", target: 0 },
			],
		},
	},

both the k6 logs on terminal and metrics on my api server are saying that they are correctly ran at 15,000 qps but the html report I got from xk6-dashboard show 15,000 qps on the first half of the test but keep rising in the later half. Anyone know might be causing this?

richmarshall · December 21, 2024, 2:57am

Hi @nattawitchaiworawit

The docs for this executor confirm that the target within stages is for iterations/time unit (complete execution of the main function), not qps/rps. Seeing that your results achieved 30K, I wonder if you have 2 requests within your main function. Try again with just 1 request (and don’t use sleep at the end).

joanlopez · December 31, 2024, 10:37am

Hi @nattawitchaiworawit, @richmarshall,

The docs for this executor confirm that the target within stages is for iterations/time unit (complete execution of the main function), not qps/rps.

Yeah, exactly, the target doesn’t speak about requests per second, but iterations.

Seeing that your results achieved 30K, I wonder if you have 2 requests within your main function. Try again with just 1 request (and don’t use sleep at the end).

It not only depends on the explicit requests present on your test script (i.e. calls to http.get or others), but might also be about redirections or similar (like here), that end up producing more requests than expected.

I hope that helps clarifying it!

nattawitchaiworawit · January 7, 2025, 1:07am

Hi, @richmarshall, @joanlopez, thanks for the response.

My iteration has only one http.post call and no sleep. Is there a way to detect if there’re any redirection response from server? I’m quite sure my api server does not have any redirection, and since the qps start to rise after a while, if there’re any redirection at all, it wouldn’t be deterministic and need to be detect on the fly. I also got quite a bit of dropped_iterations as well, I’m not sure if this is related, I’m still investigate for the cause of these dropped iterations.

richmarshall · January 7, 2025, 4:16am

Hi @nattawitchaiworawit

I suggest trying some of the following; you may already have some of these but you have not posted your whole script. Some of these ideas were provided here by @joanlopez but I included some other links from the k6 docs.

Verify exactly what your script is doing using the http-debug CL argument. Specifying --http-debug will log the request and response headers to the console; specifying --http-debug="full" will additionally log the request and response body in full. Either will show any 3xx for the response status code header. Due to the scrolling console getting unwieldy this is really only appropriate for a single iteration as a single user. If your script is already correct (i.e. if the problem only appears under load), this may not reveal anything you need on redirects, but this debug is a critical tool in any event.
k6 run --vus 1 --iterations 1 --http-debug="full" drive\path\to\script.js
If your web application is https, ensure the request formation is https and not http. This may be the root cause of a redirect which generates an extra request.

For example: http.get('https://test-api.k6.io/public/crocodiles/') [https] generates a single request with response of HTTP/1.1 200 OK
Whereas plain http: http.get('http://test-api.k6.io/public/crocodiles/') first generates a HTTP/1.1 308 Permanent Redirect then HTTP/1.1 200 OK (and this will show as 2 requests in the default end-of-test summary output).

Default handling of redirects is to allow up to 10. Even 1 unexpected redirect sometimes happening may dramatically increase your request count. And mass quantities of load test failures may occur at a faster rate than normal working requests. If you expect zero redirection under any “normal” circumstances, you can add an explicit maxRedirects: 0 into the global script options and it will fail if redirection happens.

Instead of globally you can alternatively define the redirect handling at the individual request level by adding the Params.redirects property.

Another possibility is that your application periodically times out or has back-end errors only under more significant load and/or longer duration test. In addition, some applications are coded with automatic redirection to a user-friendly error page which may use a 2xx response status code but would contain different text results in the body. To verify this is not happening, implement positive tests for your happy-path expected results via k6 checks (don’t forget the import). IMO using checks would be a normal best practice anyway. Minimally: add a check for the status code in the http response header (for example: your POST may return 201) but ideally also include a critical element from the http response body. Checks just return true or false, the test keeps running and your results will tally the % of each check that has passed.

Taking this one step further you could also add an explicit negative test by adding a check on the response header for any specific 3xx status code (or any code that is not exactly what you expect) and increment a failure. It might be easiest to use the fail wrapper and immediately throw an error on the failing check, then a new iteration starts.

In a well-behaved load test you would not need to log the full data for known working requests, but to help find where a problem starts you can choose to output some info. on only failing requests to the console. Create an if condition for success on the k6 checks and if that is false, log the exceptions. I have never personally used these, but to better preserve larger quantities of errors, it might be better to avoid the console and output the errors to either JSON or CSV files.

console.log("response.status = " + response.status)
console.log("response.url = " + response.url)
console.log("response.body = " + response.body)

P.S. Totally unrelated to all of the above: FYI that the newer versions of k6 contains a native dashboard. AFAIK this contains the same general functionality as xk6-dashboard. The dashboard works in dynamic mode while the test runs (open a browser to http://localhost:5665/ui/?endpoint=/) and you can export an HTML report at the end. It might be worth considering to simplify dependencies.

richmarshall · January 7, 2025, 4:28am

@nattawitchaiworawit Here’s a step-by-step example for using the debugger, logging data and defining checks, and an unexpected redirect situation is part of the lesson.

richmarshall · January 7, 2025, 4:48am

@nattawitchaiworawit strictly speaking this is a segue but if you need to define checks to make your test script more robust, you may want to consider the alternative of the Chai Assertion Library which is another import:
import { describe, expect } from 'https://jslib.k6.io/k6chaijs/4.5.0.1/index.js';

I personally prefer this instead of the native k6 check. To me the syntax is cleaner (especially when using multiple expects) and I feel it has better error handling, although this may show more value in a script with 2+ requests. YMMV.

nattawitchaiworawit · January 7, 2025, 6:24am

@richmarshall Thank you for the detailed debug steps. I’ll try those.

here is the full script, please let me know if something is wrong. I run this on k8s pod so I call directly using service name.

import http from "k6/http";
import { textSummary } from "https://jslib.k6.io/k6-summary/0.0.2/index.js";
import { check } from "k6";

const target = __ENV.TARGET || 10000;

export let options = {
	insecureSkipTLSVerify: true,
	summaryTrendStats: [
		"min",
		"max",
		"avg",
		"med",
		"p(99)",
		"p(99.5)",
		"p(99.9)",
	],
	discardResponseBodies: true,

	scenarios: {
		ramping: {
			executor: "ramping-arrival-rate",

			// Pre-allocate necessary VUs.
			preAllocatedVUs: 50,
			maxVUs: 2000,

			stages: [
				{ duration: "1m", target: target },
				{ duration: "10m", target: target },
				{ duration: "1m", target: 0 },
			],
	},
};

const url =
	"http://service-name:8080/path";

const body = {
	id: "id",
};

export default () => {
	const params = {
		headers: {
			"Content-Type": "application/json",
			Accept: "application/json",
			Authorization: `Bearer ${__ENV.TOKEN}`,
		},
		timeout: "1s",
	};
	const res = http.post(url, JSON.stringify(body), params);
	check(res, {
		"status was 200": (r) => r.status == 200,
		"status was 403": (r) => r.status == 403,
		"status was 404": (r) => r.status == 404,
		"status was 422": (r) => r.status == 422,
		"status was 423": (r) => r.status == 422,
		"status was 500": (r) => r.status == 500,
		"status was 501": (r) => r.status == 501,
		"status was 502": (r) => r.status == 502,
		"no response": (r) => !r.status,
	});
};

export function handleSummary(data) {
	const output_text = `${__ENV.OUTPUT}.txt`;
	const output_json = `${__ENV.OUTPUT}.json`;
	console.log(output_text);
	console.log(output_json);
	return {
		stdout: textSummary(data, {}),
		[output_text]: textSummary(data, {}),
		[output_json]: JSON.stringify(
			{ http_req_failed: data.metrics.http_req_failed.values },
			null,
			2,
		),
	};
}

richmarshall · January 7, 2025, 8:29am

@nattawitchaiworawit

I can’t test against your application directly, but I have a few questions & comments (I am not saying for sure anything is wrong though):

Your URL is "http://service-name:8080/path" so did you try https here?
Script shows you are providing your auth token from the ENV variable but I don’t see where that is generated. Are you sure that the token remains valid for the test duration? If it expires before you are expecting and there is no process to refresh the token, maybe that causes the errors.
On the headers for your HTTP post, I don’t understand the 1 second timeout. I wonder if this is causing the problem after load increases. Default request timeout is 60 seconds. When I try multi-threaded load with 1 second timeout there are many errors logging to the console (k6 warnings). I think you can omit this parameter or set to 1m. This part of the code is where you could consider introducing the redirects parameter (setting to 0).

WARN[0001] Request Failed                                error="Post \"https://test-api.k6.io/auth/token/login/\": request timeout"
WARN[0001] Request Failed                                error="Post \"https://test-api.k6.io/auth/token/login/\": request timeout"
WARN[0003] Request Failed                                error="Post \"https://test-api.k6.io/auth/token/login/\": request timeout"
WARN[0003] Request Failed                                error="Post \"https://test-api.k6.io/auth/token/login/\": request timeout"
WARN[0005] Request Failed                                error="Post \"https://test-api.k6.io/auth/token/login/\": request timeout"
WARN[0005] Request Failed                                error="Post \"https://test-api.k6.io/auth/token/login/\": request timeout"
WARN[0007] Request Failed                                error="Post \"https://test-api.k6.io/auth/token/login/\": request timeout"
WARN[0007] Request Failed                                error="Post \"https://test-api.k6.io/auth/token/login/\": request timeout"
WARN[0009] Request Failed                                error="Post \"https://test-api.k6.io/auth/token/login/\": request timeout"
WARN[0009] Request Failed                                error="Post \"https://test-api.k6.io/auth/token/login/\": request timeout"

Until you can debug your request and response, I would disable discardResponseBodies within the script options. You might be receiving some error or unexpected text in the response body. I believe discardResponseBodies: true is recommended as part of an approach to achieve highest client load generation performance but you need to be really sure your script is operating correctly.
I have some questions on the ramping-arrival-rate executor. The startRate option has a default of 0 so that is likely OK to omit, but I am not sure about timeUnit; the documentation states the default is “1s” but you are using “1m” or “10m” within your stages. I am not sure if there is a conflict here, since the documentation states “Period of time to apply the startRate to the stages’ target value. Its value is constant for the whole duration of the scenario, it is not possible to change it for a specific stage.” Since I have not used this executor, personally I would use all the options from the documentation and the value “1m” for timeUnit.
Is it the intention for your requirement for the ramp-up from 0 to 10000 iterations per minute, in a 1 minute timeframe? The iteration time is the main function execution time, and with a main function containing a single request, that will target ~167 iterations/second, which seems like moderately high load. Do you receive the unexpected results with a lesser # of iterations such as 1000?
I don’t understand this approach for defining multiple possible status codes within a check. The script “as-is” will fail almost all of the checks. Below is my experiment using your logic.

     ✓ status was 200
     ✗ status was 403
      ↳  0% — ✓ 0 / ✗ 100
     ✗ status was 404
      ↳  0% — ✓ 0 / ✗ 100
     ✗ status was 422
      ↳  0% — ✓ 0 / ✗ 100
     ✗ status was 423
      ↳  0% — ✓ 0 / ✗ 100
     ✗ status was 500
      ↳  0% — ✓ 0 / ✗ 100
     ✗ status was 501
      ↳  0% — ✓ 0 / ✗ 100
     ✗ status was 502
      ↳  0% — ✓ 0 / ✗ 100
     ✗ no response
      ↳  0% — ✓ 0 / ✗ 100

     checks.........................: 11.11% 100 out of 900

Furthermore, there is a suspicion of redirects happening but you are not including any 3xx status codes.

I would check for just the 200 status code, and have an if condition generate a log file record of any request that has a status code of anything else besides 200. If you set the redirects parameter to 0 either globally or on your post parameters, that will help identify if a redirect situation arises.

As suggested before, I would add at least one check on some reliable element from the http response body. I have done quite a lot of testing of APIs with heavyweight http responses and k6 is highly performant even when doing parsing of a JSON response body for multiple deeply nested elements.

nattawitchaiworawit · January 7, 2025, 9:10am

@richmarshall Thank you for your quick response again. Since I’ve pushed this back and do another work in the mean time. I’ll try to apply your suggestion above when I get around to test it again.

As for your questions

The https are apply at loadbalancer level outside of k8s cluster, so this service only support http
The token is fixed and never expire
It’s add it when I don’t know about the scenarios yet and just use vus as iteration counter by fixing iteration time to 1s using timeout+sleep. I should remove it now that I moved to using scenarios.
I’ll try that
I’ll need to read more about this
The target here is 10,000 iteration per second. It’s normal with 5k test, I only got this result with load like 8-9k+. (the graph below is from “constant” scenario though, not “ramping”)

image1428×714 27.3 KB
You can ignore this, I just want to check what http code my service are responding, not really need to check pass or fail here. But you’re right, maybe I want to add 3xx too.

Topic		Replies	Views
Requests failed is 1/s constantly for a ramp of up to 500 rps without a running server Grafana k6	4	133	December 12, 2024
Questions around http request OSS Support	6	707	July 19, 2022
Iterations do not correspond to those specified in executor "ramping-arrival-rate" Grafana k6	0	123	January 19, 2024
Execution timed out after 120 seconds Grafana k6	3	175	October 2, 2024
K6 tests interrupted on Blazemeter run OSS Support	5	326	October 16, 2023

Xk6-dashboard html report show more qps than it's supposed to be

Related topics