K6 failed to upload metrics due to canceled context (OTEL)

Hi,

I noticed that the last batch of metrics fails to upload to an Otel Collector.
Both gRPC and HTTP protocols have the same behavior.

To replicate set the export interval to 1 second, K6_OTEL_EXPORT_INTERVAL=1. (The default 10s interval doesn’t log if the test is completed, but all metrics from the final batch are not sent.)

Example: .\k6 run -e K6_OTEL_EXPORT_INTERVAL=1 -e K6_OTEL_GRPC_EXPORTER_INSECURE=true -e K6_OTEL_GRPC_EXPORTER_ENDPOINT=0.0.0.0:4317 -e K6_OTEL_EXPORTER_TYPE=grpc -o experimental-opentelemetry --vus 3 --duration 10s test.js

gRPC logs:
INFO[0000] Setting up source=console
INFO[0017] Tearing down source=console
INFO[0020] 2024/10/14 17:46:58 failed to upload metrics: context canceled: rpc error: code = Canceled desc = context canceled
INFO[0023] Handling summary source=console

HTTP logs:
INFO[0000] Setting up source=console
INFO[0017] Tearing down source=console
INFO[0020] 2024/10/14 17:00:44 failed to upload metrics: Post “http://0.0.0.0:4318/v1/metrics”: context canceled
INFO[0023] Handling summary source=console

Please advise how to resolve this.

Hi @kiksplx !

Welcome to the community forums! :wave:

I believe, in that case, k6 doesn’t control the metrics upload. It delegates it entirely to the OTEL SDK. By checking it, I see that there is a configuration option:

// WithTimeout configures the time a PeriodicReader waits for an export to
// complete before canceling it. This includes an export which occurs as part
// of Shutdown or ForceFlush if the user passed context does not have a
// deadline. If the user passed context does have a deadline, it will be used
// instead.
//
// This option overrides any value set for the
// OTEL_METRIC_EXPORT_TIMEOUT environment variable.
//
// If this option is not used or d is less than or equal to zero, 30 seconds
// is used as the default.

Right now, we don’t provide the overwriting it on the k6 side, so you could try directly using OTEL_METRIC_EXPORT_TIMEOUT.

Let me know if that helps!

Cheers!

Hi @olegbespalov! Thanks for your response.

I tried using OTEL_METRIC_EXPORT_TIMEOUT, but it had no impact. Even just testing 1 VU with 100 iterations, the report summary will confirm the completion of the 100 iterations, but the OTLP receivers will miss 1-2% of them, and you get this log at the end:

*INFO[0006] 2024/10/21 14:53:25 failed to upload metrics: context canceled: rpc error: code = Canceled desc = context canceled*

Are you able to replicate this issue on your end?

Please let me know if you need more information. I appreciate your help.

Hi @kiksplx

After looking again at your original message, I believe the issue is that you don’t specify the unit for the export interval K6_OTEL_EXPORT_INTERVAL=1. This resolves to not 1s (second) but 1ms (millisecond), which is too short.

Could you please try using K6_OTEL_EXPORT_INTERVAL=1s.

Hope that helps.

Hi @olegbespalov,

Ah yes, you’re right it needs the unit.
I tried passing K6_OTEL_EXPORT_INTERVAL=1s unfortunately more metrics were not sent (~80 of 100). I also tested using the default (10s) but lost more metrics (~60 of 100).

It seems that the increased duration of the K6_OTEL_EXPORT_INTERVAL results in more metrics potentially being lost at the end.

I’m guessing that the k6 exporter ends as soon as the main function exits, thus missing to push the remaining metrics to OTLP. Thoughts?

I tried adding the option --linger to see if it completes the push, but did not work.

@kiksplx

I’m guessing that the k6 exporter ends as soon as the main function exits, thus missing to push the remaining metrics to OTLP. Thoughts?

Do you still see the message in the logs?

*INFO[0006] 2024/10/21 14:53:25 failed to upload metrics: context canceled: rpc error: code = Canceled desc = context canceled*

I tried passing K6_OTEL_EXPORT_INTERVAL=1s unfortunately more metrics were not sent (~80 of 100). I also tested using the default (10s) but lost more metrics (~60 of 100).
It seems that the increased duration of the K6_OTEL_EXPORT_INTERVAL results in more metrics potentially being lost at the end.

Is there a way to get the script that can be used to reproduce this?

Hi @olegbespalov,

Do you still see the message in the logs? (failed to upload metrics: context canceled)

I don’t. This has not happened after I applied your recommendation to increase the K6_OTEL_EXPORT_INTERVAL to at least 1s. Thank you.

Is there a way to get the script that can be used to reproduce this?

Yes. I was hoping you could confirm if you can reproduce it.

Here’s the test script:

import http from 'k6/http';
import { sleep } from 'k6';

export default function () {
    http.get('https://test-api.k6.io/public/crocodiles/');
	sleep(1);
}

and command:

./k6 run -e K6_OTEL_METRIC_PREFIX=k6_ -e K6_OTEL_GRPC_EXPORTER_INSECURE=true -e K6_OTEL_GRPC_EXPORTER_ENDPOINT=0.0.0.0:4317 .\test.js -o experimental-opentelemetry --vus 1 --iterations 100

To reproduce, I need to understand what I should reproduce. After fixing the export interval, the error is gone, which I see on my end, and you just confirmed that, but if the issue remains, I need other details like the script and the exact issue.

Like:

I tried passing K6_OTEL_EXPORT_INTERVAL=1s unfortunately more metrics were not sent (~80 of 100). I also tested using the default (10s) but lost more metrics (~60 of 100).

What are these metrics you lost when detecting? How do you measure the loss? All that said, in order to help you, I need as many details as you can provide.

Hi @olegbespalov,

I think it’s working now, please ignore my previous findings. I’ve marked your recommendation as the solution.

Thank you for being so helpful.

1 Like