Bug with k6 Threshold Validation

jill.lombardi · October 28, 2022, 2:44pm

I noticed something today with k6 when running some tests and wanted to ask what was going on and if there is anyway for me to remedy it. When we are running our performance tests, we are setting an upper and lower bound for the thresholds for median times and p(90) times. I was running some tests today and was printing out the raw JSON data from the test to the console to look at it. Initially, all of the threshold statuses were accurate ( i.e. if the median time was greater than our upper bound it would say that the threshold for the upper bound on the median was not ok). However, as I continued running a few other tests later I started to notice that k6 seemed to be marking the wrong thresholds as failed. In the photo pasted below you can see that the median time for the test was 457.84 . Our lower bound on the median time was 39 and our upper bound was 95. Since 457.84 is greater than our lower bound 39, that threshold should be marked as “ok”: true and because 457.84 is not less than our upper bound 95, that threshold should be marked as “ok”: false. However, as you can see in the photo below, the opposite is true with “med>39” being marked as “ok”: false. and “med<95” being marked as “ok”: true. Let me know if I can provide any other information. Thank you for your help in advance!

mismarked_thresholds (1)

oleiade · October 31, 2022, 12:55pm

Hi @jill.lombardi

At first glance, it does look like a bug indeed, but let’s try to confirm that. It would be really helpful if you could indicate the k6 version you’re using, and if you could also provide us with an anonymized rough example of the script you’re running. That way we can cross-check and try to reproduce the bug as fast as possible.

If we’re able to reproduce it, we’ll likely open a GitHub issue to track it and its future resolution.

Thanks a lot for your help

jill.lombardi · October 31, 2022, 4:55pm

Hello! For the version of k6, we are starting a docker container after running docker pull loadimpact/k6. Hopefully this info is what you need, but let me know if there’s another place I could look for the version. Additionally, here is a rough example of what we are running:

import http from 'k6/http';
import { Trend, Rate } from 'k6/metrics';

export let options = {
    thresholds: {
        'example_trend_response_time': ['med>39', 'med<95'],  
        'example_rate_successful_requests': ['rate==1'],
    },
   scenarios: {
        exampleScenario: {
          executor: 'constant-arrival-rate',
          exec: 'exampleFunction',
          rate: 1,
          timeUnit: '30s',
          duration: '30s',
          preAllocatedVUs: 2,
        },
    },
};

const putExampleTrend = new Trend('example_trend_response_time');
const putExampleSuccessRate = new Rate('example_rate_successful_requests');

export function exampleFunction() {

    let examplePUT = http.put(`https://fake/endpoint`,{}, { headers: { 'x-csrf-token': fakeToken, } }) ;

    // exampleValidationFunction() outputs logging messages to the console 
    // and gets the response time and status of the request
    let examplePUTInfo = exampleValidationFunction(examplePUT, 200, exampleFunction);
    putExampleTrend.add(examplePUTInfo.responseTime);
    putExampleSuccessRate.add(examplePUTInfo.successRate == 200);
}

export function handleSummary(data) {
    console.log(data);
    return {
        'stdout': textSummary(data, { indent: ' ', enableColors: true}),
    }
}

Let me know if I can provide you with any additional information! Thanks so much for the response!!

oleiade · November 1, 2022, 8:24am

That’s super helpful @jill.lombardi thanks a lot for that

I’m currently working on trying to reproduce the issue, I’ll let you know once I know more

oleiade · November 2, 2022, 1:44pm

Hi @jill.lombardi

After some debugging, I can confirm I’m able to reproduce the behavior you observed, and this is very likely to be a bug in the current version of k6. I have documented it in a GitHub issue.

The team hasn’t prioritized it yet, but being a bug, I expect we will work on this as soon as possible. I’ll let you know as soon as I have more visibility as to when we expect this to be fixed

jill.lombardi · November 2, 2022, 5:44pm

Hi again @oleiade ! Thanks so much for the update I appreciate it! Look forward to hearing from you again!

jill.lombardi · November 2, 2022, 10:05pm

Hi @oleiade! I just took a look at the GitHub issue you created and wanted to add that I have also noticed this same issue for some of P(90) thresholds that we have.

oleiade · November 4, 2022, 12:25pm

Hi @jill.lombardi

Thanks for the heads-up As you might have read in the GitHub issue, we have traced back the cause of the issue. We have prioritized its resolution for version 0.42, upcoming beginning of 2023.

In the meantime, we believe you should be able to solve your issue by using the 50% percentile p(50) instead of med in your scripts

jill.lombardi · November 4, 2022, 7:10pm

Hi @oleiade

Ah gotcha…What would be the best way to implement this? I just tried by switching out med with P(50) in this line:

export let options = {
    thresholds: {
        'example_trend_response_time': ['p(50)>39', 'p(50)<95'],  
        'example_rate_successful_requests': ['rate==1'],
    },

And when I ran this, the thresholds returned as undefined.

oleiade · November 8, 2022, 5:12pm

Hi @jill.lombardi

We have opened a Pull Request implementing a fix for your issue. This was indeed a bug in our thresholds evaluation engine. We are still deciding if we will produce a v0.41.1 version for it, or if it will land in v0.42.0. I’ll let you know as soon as I have more information on that front.

If you have the time and feel comfortable enough in go to do that, I’d really appreciate if you could try the Pull Request branch and tell us if it works as expected from your perspective

jill.lombardi · November 10, 2022, 9:42pm

Hi @oleiade

Unfortunately I am using this for work purposes and my team does not feel comfortable using unreleased software Thank you for all of your work on this nonetheless!

jill.lombardi · November 15, 2022, 6:43pm

Hi @oleiade !

I just noticed a pattern with our results today that I thought I would share here. I am not sure if this additional info will be useful at all to you since you’ve seemed to have pinpointed the bug, but I thought I would mention it anyway.

We have a few thresholds where we are checking both the med and p(90). For example, a threshold like this would look like: 'example_trend_response_time': ['med>39', 'med<95', 'p(90)>30', 'p(90)<100']'. In a threshold like this one, it appears to properly validate both the med and the p(90). We are only seeing the issue with incorrect validation when the threshold only includes med: 'example_trend_response_time': ['med>39', 'med<95'] .

In addition to this, I wanted to mention that I was not able to get the fix where I substitute p(50) for med to work.

oleiade · November 16, 2022, 2:52pm

Interesting! Thanks for collecting this information and getting it back to us

Considering the bug we have spotted, the behavior regarding med you further describe is expected. The behavior of Median and Percentile are tightly coupled in k6, which led to this specific issue you ran into, and I would put the scenario you pointed out on the account of that specific bug too.

Good news though, the fix for this bug has been merged in k6 master, and it has been decided it will land in v0.42.0 around mid-december

oleiade · November 16, 2022, 2:52pm

Also, regarding p(50), we would expect it to work as intended, but I will need to run some tests and will get back to you

oleiade · November 22, 2022, 9:46am

Hey @jill.lombardi

Just a heads up that I’ve had a pretty heavy workload this last few days and didn’t get to experiment further, but I’m not forgetting this, and I shall come back to it in the next couple of days

oleiade · December 21, 2022, 7:42am

Hi @jill.lombardi

Just a heads-up that k6 version 0.42 is out, and it contains the fix for this specific issue

jill.lombardi · January 3, 2023, 10:15pm

Amazing! Thank you so much for your work on this

Topic		Replies	Views
Invalid threshold error OSS Support	3	819	May 6, 2022
Counter does not show correct value in K6 Threshold and test fails Cloud Support	1	518	January 12, 2022
Thresholds failing, displaying all 0's for response times OSS Support	6	1051	January 24, 2023
Noticed the threshold values are not calculated in recent tests while using same script earlier the values were populated Cloud Support dashboard	4	259	February 14, 2024
Request Fail rate Grafana k6	4	1672	December 5, 2022

Bug with k6 Threshold Validation

Related topics