I’m planning to use k6-operator with AWS Karpenter for auto-scaling our load testing infrastructure. Based on some GitHub issues I’ve seen (like the initialization timeout problems), I want to make sure I configure everything properly from the start.
My setup:
EKS cluster with Karpenter for node provisioning
Custom k6 image (~400MB) with xk6-browser extensions
Planning to run distributed load tests with multiple runner pods
Questions:
Node provisioning delays: Are there any specific Karpenter NodePool configurations recommended for k6 workloads? I’m concerned about the 60-second initialization timeout when new nodes need to be provisioned.
Pod annotations: I’ve seen mentions of karpenter.sh/do-not-disrupt annotation, but there seems to be a Helm chart issue (#477). What’s the current workaround for preventing node disruption during tests?
Image optimization: Any best practices for optimizing large k6 images with browser capabilities? Should I pre-pull images or use specific image pull policies?
Resource requests/limits: What CPU/memory requests work well with Karpenter’s provisioning decisions for k6 runners?
Startup probe configuration: Should I adjust probe settings to account for longer pod startup times on new nodes?
Has anyone successfully run k6-operator at scale with Karpenter? I’d appreciate any lessons learned or configuration examples.
We do not have any ready-made guide on Karpenter, sadly. But I’ve done some k6 testing with it and IMO, in general case, it’d require some tweaking to make for a smooth experience. However, this also heavily depends on the type of testing you’re going to have.
Node provisioning delays
I’m concerned about the 60-second initialization timeout when new nodes need to be provisioned.
Which timeout specifically do you mean here? If it’s about the node becoming ready, then in general case, it won’t matter for the k6-operator test. It’ll matter for the fact that you might have to wait longer for the test to even start. But the k6-operator doesn’t have any timeouts at the moment; it’ll wait indefinitely.
Pod annotations: I’ve seen mentions of karpenter.sh/do-not-disrupt annotation, but there seems to be a Helm chart issue (#477)
Karpenter disruptions are certainly an issue for the quality of k6 tests, so they should be prevented. But I don’t follow what the problem with the annotation is: could you link which issue you mean here?
So unless there is a reason to use a custom image, I’d recommend to use an official k6 image. It’s a bit smaller too: closer to 300MB rather than 400MB.
As for pre-pulling: I think it depends on the type of testing. For example, if you’re going to run a large browser test, which requires lots of new nodes, every few hours, you might want to pre-pull images indeed. In comparison, if it’s once a week, perhaps, it’s fine to pull images as usual. How exactly to do that: such caching strategies are outside of k6 or k6-operator scope, so the general rules apply. You can pre-built node images, for example, or use additional tooling to pre-pull images. But it certainly makes sense to plan for the testing needs ahead.
As for imagePullPolicy: by default, it’s IfNotPresent as designated by Kubernetes. So it might be that you won’t even need to change it.
Resource requests/limits
Since we’re talking about browsers, this is an open question at the moment See this issue:
Briefly, browser resources usage heavily depends on your website. There are no general recommendations that will fit for any website and for any kind of test. k6-browser is, of course, significantly more heavy than plain k6. For example, see how much CPU and memory everyday Chromium takes, even when idle. Something similar happens with the k6 browser extension. I very much recommend to monitor your runners from the beginning and to tune resource usage accordingly.
Startup probe configuration
Do you mean setting startup probe for k6 runners? k6-operator doesn’t actually support that now… Only liveness and readiness probe can be set. On a good side, this means startup probe never came up as an issue for anyone So tweaking liveness and readiness probes, as needed, should be sufficient.