"http_duration_req" median and minimum decrease when load increase

Hello, I’ve been doing some really basic test with k6 on different http servers,
and I’ve noticed that when I increase the load (in request per second), the latency (http_duration_req metric) decrease and it seems quite odd.

Two scenarios for example:

"s0": {
    executor: 'constant-arrival-rate',
    rate: 100,
    duration: '100s',
    gracefulStop: '0s',
    preAllocatedVUs: 100,
    maxVUs: 200,
"s1": {
    executor: 'constant-arrival-rate',
    rate: 500,
    duration: '100s',
    gracefulStop: '0s',
    preAllocatedVUs: 500,
    maxVUs: 600,

And when i look at the summary at the end of each run, the median and the minimum for the “http_req_duration” metric is ALWAYS lower for the s1 than for the s0.

And I have the intuition that it should rather be the opposite.

Does anyone have an explanation?

I notice the same thing for different webservers, flask with Python, PHP server…, on the same machine or not.

Tens of tests after, it seems that the latency isn’t dependant on the simple “request rate” but on the number of concurrent requests on differents TCP connections. I’m still investigating

But still I dont get how can I get a lower latency while getting a higher requests rate.

Hi @fullfox,

I increase the load (in request per second), the latency (http_duration_req metric) decrease and it seems quite odd

Can you post some numbers so we can apply math for understanding better what is happening, please? You should consider that average could not be the a optimal metric for latency evaluation. You should use percentile (e.g p99), or at least consider the standard deviation.

Are you sure the system under load can keep a constant latency? Otherwise, the comparison is really difficult without seeing a chart. If you need a stable implementation you can use the https://httpbin.test.k6.io/delay/{delay} endpoint. Using it should help in sharing a reproducible test case.

Hi @codebien,
Here’s the summary for those two scenarios:

Scenario 0 (100rps)
Scenario 1 (500rps)

I’m pretty sure the system under load can keep a constant latency but I’ll try the endpoint you gave me.

as you can observe the minimum and the maximum values are quite different, so it’s fine to think that the metric could contain some noise in the set of values which means the latency’s distribution is not full constant. If you check the percentile you can observe a consistent value because it can mitigate better the noise.

You could graph your metric using an external output like Prometheus or InfluxDB for a better understanding of your test results.

I’ve made the following graphs with a flask and a php server:

For sure there’s a large std but still there’s a clear decreasing trend.
It’s really easy to reproduce, just run the two scenarios with k6 against a random webserver and I’m pretty sure you will get the same results.

(On my graph, the delay boxplots are just the distribution of the http_req_duration metric, the “requests per second (goal)” in the top are the “rate” attribute in the script, and the “requests per second (sent)” are the measured ones as the summary does (http_req/s).

Hey @fullfox ! I don’t want to comment on any technical reasons for the phenomenon; your data is counterintuitive to me, too. I’m sure some experts can help you understand what’s going on.

But, I ran my own test on the test.k6.io server, and I can confirm that, for at least my example, response time definitely increased with load increase.

This was the script I ran, you could try it yourself.

import http from "k6/http";
import { sleep } from "k6";

export const options = {
  stages: [
    { duration: "10m", target: 1000 },
    { duration: "5m", target: 1000 },
    { duration: "10m", target: 5000 },
    { duration: "5m", target: 5000 },
    { duration: "5m", target: 0 },

export default function () {

I ran this test on the k6 Cloud servers to visualize the results (public link to the test run). The test server took a surprising amount of load before latency increased, but once it happened, it happened sharply.

The median increase in latency doesn’t show very well, but you can see the graph for the p99 values has a pretty dramatic knee.

Unfortunately, I don’t know what’s happening on your servers, but at least I can provide a counter-example :slight_smile: .

1 Like

One big difference to note between our tests: I’m using the ramping-vus executor, and yours uses the constant-arrival-rate

From the docs on constant arrival rate.

Since iteration execution time can vary because of test logic or the system-under-test responding more slowly, this executor will try to compensate by running a variable number of VUs—including potentially initializing more in the middle of the test—to meet the configured iteration rate.

Thank you a lot for your counter example, it will help me investigating more.
Yeah I think it’s definitely linked to the constant-arrival-rate executor, I got the same observations as you when using a different one. And I also got an increasing req-duration-99p, it’s only about the min and the median.

I could not reproduce this phenomenon using the k6 cloud,
here’s my results:

It is definitely constant

I ran this test from my own machine on the same target http://example.org/, I got the same results as in the cloud.

Then i ran it against a webserver on the same local network and the phenomenon appeared again.

But the phenomenon is only about tens of microseconds so it may be blurred when the latency is about few tens of milliseconds.

I could not reproduce this phenomenon …

I’m not sure which phenomenon!

Let’s call the phenomenon of “latency falling while load rises” the weird phenomenon. I don’t think your cloud test shows the weird phenomenon. Rather, latency just stays flat. I’m guessing that the normal phenomenon of “latency rising with load” would show here too, but your test didn’t have enough load. Notice how my latency also stayed absolutely flat until it hit the knee around 2500 VUs.

I think you’re also right that it’s too hard to say with such small values and small tests.

Also, let’s not load test example.org! (Unless you own it). Instead, you can use your own site, or one of the dedicated k6 test servers, like test.k6.io.

The first scripts you ran definitely did show the weird phenomenon though. I think it must have been related to the constant-arrival-rate executor, which may react to changing latency. I don’t know why latency went down, though. I spoke with an engineer who suggested you might have gotten some kind of performance benefit from parallelism. But that was just an idle guess. It needs to be investigated.

If you discover the answer or continue to see the weird phenomenon with other executors, please follow up!

1 Like

Hi @fullfox have you considered that your system may be caching your test data which will almost always produce an artificially low roundtrip response time. In addition the backend or server response would cause your latency charts to drop. This typically would be due to your organic traffic not experience caching. Also the number of transactions generated by k6 could be dwarfing your non-synthetic traffic which would cause this statistical skew.

without more info this is only an opinionated guess.

good luck.


you might have gotten some kind of performance benefit from parallelism.

Yeah I definitely think it has to do with that. I can’t reproduce the “weird phenomenon” with the per-vu-iterations executor btw.

And yeah about my cloud test I meant I could not reproduce the “weird phenomenon”, as you just said, it stayed absolutely flat.

And I was just load testing with about 100rps but sure I’ll use test.k6.io next times.

@PlayStay Can you explain what you mean with caching data ? I’m not networking expert ^^

Thank you all guys for helping me, I’ll keep you updated

Hi @fullfox as an example of the impact of caching on performance testing see this screenshot of a latency before and during a load test. I won’t go into a long description of the various types of caching that can occur in a modern system. In short if you’re testing a system which caches requests, once your test data has been cached, you can expect the latency chart to show a decrease in measured/visualized latency.


Thank @PlayStay , but is caching happenning at the level of the webserver ? Or before by the OS ? Maybe I can prevent this behaviour with random payload in http request ?

Hi @fullfox
Generally it is recommended to have a pool of variable data inputs to avoid caching.
Especially if that step is querying or loading a resource.
I used to have 10 different data records that behaved the same and randomly use a different one on each iteration.
Nowadays I’d recommend having more than just 10, but that will definitely help. :slight_smile:


Hi @fullfox - @SrPerf makes a good point. To answer your question, yes caching occurs at the web server level and before depending on the make up of your system infrastructure/architecture. However there is another factor which “can possibly” result in the observation you’re seeing. Not only is variable data needed to ensure the request data (account, token, whichever) is not stored in cache. which usually uses the most resources and is an end to end transaction (client → Service → DB); but you may want to consider that the payload does not exactly match a real browser or api call through you system. which means even if your request is not cache’d, the data being fetched is “happy path” rather than pseudo random and worst case or real case.

here’s an example. if your system under normal user conditions measure an average response/latency of 100ms, you as a perf engineer should craft 2 scenarios. 1 just get it working happy path, run as much lightweight load as possible to stress the host and infrastructure resources. 2 a test which when executed once with one VU returns the same 100ms response/latency as a normal user. Now what you have is a payload which when executed under load annihilates just your service and/or system bottleneck as if the load client were a normal user.

But it’s easier to blame caching :slight_smile: rather than blame your test and data setup for this observation. In short the issue you’re seeing is a problem depending on your expectations, the system stability and resilience requirements and/or understanding of the system architecture.

Hope this helps. sorry for being vague, but I think you’re ok. You just need to understand what you’re after. A working load test that throws packets down the wire to stress your system resources or a function request that accurately mimics the user then scale that request out to your expected user volume or rate.

1 Like