I have a test scenario that uses the ramping-arrival-rate executor, and defines several stages with increasing numbers of target RPS (requests per second).
I am using a DataDog dashboard with charts that track the RPS reported by k6, and also the RPS reported by the backend system. On one chart, I am plotting the metrics k6.http_reqs and trace.http.request.hits.
While running the test plan, the two metrics start out by reporting the same exact values over time, but after a certain point the k6.http_reqs values plateau, while trace.http.request.hits keeps growing, exactly matching the target RPS defined in the k6 plan.
As you can see in the following graph, the k6.http_reqs metric plateaus at about 380 RPS, while trace.http.request.hits goes all the way up to 740 RPS, as defined in my stages. I’ve also added the graph definition to the image. I would like to understand why this happens and whether there is any way to fix k6’s reporting.
It’s difficult to pinpoint the cause of this with any certainty, but here are some thoughts:
I see that you’re graphing “Successful RPS” as responses with only status 200, while the blue line in your chart is showing “2xx”. Technically, a “successful” response could be any code from 200-299 (or even 3xx if you count redirects). So could it be that the discrepancy is because of the different filters?
Ah, after a closer look, these other responses would show up as “Failed RPS” in your graph, so nevermind this…
Is there a proxy or load balancer between k6 and the backend system? If so, it could be that k6 receives a different response from the proxy, while the backend returns successful responses. This would be counterintuitive unless the backend was failing, but maybe the proxy is acting as a caching node, and returning different responses after a certain load?
Like I said, these are just speculations, and it’s difficult to know without more details, or without being able to reproduce it directly.
One thing you could try is to use another output backend instead of Datadog, to avoid any issues with that integration. Do you also see this behavior in the k6 end-of-test summary (you would see lower http_reqs, and maybe some errors), or if you use another output (Prometheus, InfluxDB, JSON, etc.)?
Hi, thanks for the reply! There is indeed a load balancer, as there’s a Kubernetes setup with multiple replicas for the service, which also scales out automatically (creating new replicas) to keep up with high traffic levels. Scaling out does happen during the course of this test.
I’ve asked around and it does seem that when called from the outside, like K6 does here, there are several proxies in front of the API.
The weird issue is that K6 doesn’t seem to report anything for those extra requests - not even failures, nor timeouts, nothing. I’ll look into these proxies though, and try your other suggestions as well.