K6 on k8s with large script not starting

Hi!
We faced a strange k6 behaviour in k8s kluster. When test script is more than 500 kb, job become failed. At the same time the same script running on Windows PC goes successfully. We turn on verbosity and we can see the log:

time="2023-06-12T16:01:56Z" level=debug msg="Logger format: TEXT"

time="2023-06-12T16:01:56Z" level=debug msg="k6 version: v0.44.0 (2023-04-24T10:35:38+0000/v0.44.0-0-g14d80f6f, go1.20.3, linux/amd64)"
time="2023-06-12T16:01:56Z" level=debug msg="Resolving and reading test '/test/test.js'..."
time="2023-06-12T16:01:56Z" level=debug msg=Loading... moduleSpecifier="file:///test/test.js" originalModuleSpecifier=/test/test.js
time="2023-06-12T16:01:56Z" level=debug msg="'/test/test.js' resolved to 'file:///test/test.js' and successfully loaded 804763 bytes!"
time="2023-06-12T16:01:56Z" level=debug msg="Gathering k6 runtime options..."
time="2023-06-12T16:01:56Z" level=debug msg="Initializing k6 runner for '/test/test.js' (file:///test/test.js)..."
time="2023-06-12T16:01:56Z" level=debug msg="Detecting test type for..." test_path="file:///test/test.js"
time="2023-06-12T16:01:56Z" level=debug msg="Trying to load as a JS test..." test_path="file:///test/test.js"

time="2023-06-12T16:01:57Z" level=warning msg="The source for `file:///test/test.js` needs to go through babel but is over 256000 bytes. For performance reasons source map support will be disabled for this particular file."

After the line about file source - nothing more. See number 1 on a screenshot.

K6 custom resource looks like this:

apiVersion: k6.io/v1alpha1
kind: K6
metadata:
  name: student-production-2-parallel-1
  namespace: qa-hot
spec:
  arguments: --out experimental-prometheus-rw --tag testid=20230612-2 --verbose
  parallelism: 1
  runner:
    env:
    - name: K6_PROMETHEUS_RW_SERVER_URL
      value: http://prom-2-44.default:9090/api/v1/write
    - name: K6_PROMETHEUS_RW_TREND_AS_NATIVE_HISTOGRAM
      value: "true"
    - name: K6_PROMETHEUS_RW_TREND_STATS
      value: p(90),p(95),p(99),max
  script:
    configMap:
      file: test.js
      name: student-production-2

See number 2 on a screenshot.

kubectl describe pod you can see on screenshot, number 3.

Container isn’t oomkilled, node has enough system resources.
Based on these logs we think when k6 tries to process script file over 500 kb it’s getting stuck, fails healthchecks and then container getting killed by k8s.
When script file is less 500 kb test goes successsfully, k6 operator works fine.

So here is the questions:

  1. Am I correct with understanding of failure process?
  2. What means Exit code: 2 (last screen)? How can we debug this error?
  3. If I am correct about failure process, is there the way to change healthchecks for the runner? Or what else we can do to get rid of this issue?

Hi @dlight, welcome to the forum :wave:

This is an interesting case. Yes, it does appear as you described… To confirm, could you please run your script with k6 run --verbose ... --log-format json and check the timestamps between warning about Babel and the next log lines? Esp. these lines which should appear somewhere after that warning:

{"level":"debug","msg":"Babel: Transformed","t":42430552,"time":"2023-06-16T00:06:18+03:00"}
...
{"level":"debug","msg":"Starting the REST API server on localhost:6565","time":"2023-06-16T00:06:18+03:00"}
1 Like

Hi, @olha !
There is no log lines after Babel warning:

{"level":"debug","msg":"Logger format: JSON","time":"2023-06-16T09:33:54Z"}
{"level":"debug","msg":"k6 version: v0.44.0 (2023-04-24T10:35:38+0000/v0.44.0-0-g14d80f6f, go1.20.3, linux/amd64)","time":"2023-06-16T09:33:54Z"}
{"level":"debug","msg":"Resolving and reading test '/test/test.js'...","time":"2023-06-16T09:33:54Z"}
{"level":"debug","moduleSpecifier":{"Scheme":"file","Opaque":"","User":null,"Host":"","Path":"/test/test.js","RawPath":"","OmitHost":false,"ForceQuery":false,"RawQuery":"","Fragment":"","RawFragment":""},"msg":"Loading...","originalModuleSpecifier":"/test/test.js","time":"2023-06-16T09:33:54Z"}
{"level":"debug","msg":"'/test/test.js' resolved to 'file:///test/test.js' and successfully loaded 804763 bytes!","time":"2023-06-16T09:33:54Z"}
{"level":"debug","msg":"Gathering k6 runtime options...","time":"2023-06-16T09:33:54Z"}
{"level":"debug","msg":"Initializing k6 runner for '/test/test.js' (file:///test/test.js)...","time":"2023-06-16T09:33:54Z"}
{"level":"debug","msg":"Detecting test type for...","test_path":"file:///test/test.js","time":"2023-06-16T09:33:54Z"}
{"level":"debug","msg":"Trying to load as a JS test...","test_path":"file:///test/test.js","time":"2023-06-16T09:33:54Z"}
{"level":"warning","msg":"The source for `file:///test/test.js` needs to go through babel but is over 256000 bytes. For performance reasons source map support will be disabled for this particular file.","time":"2023-06-16T09:33:55Z"} 

Once we faced failure even on test scrips less then 500 kb:


time="2023-06-11T16:33:33Z" level=debug msg="Logger format: TEXT"
time="2023-06-11T16:33:33Z" level=debug msg="k6 version: v0.44.0 (2023-04-24T10:35:38+0000/v0.44.0-0-g14d80f6f, go1.20.3, linux/amd64)"
time="2023-06-11T16:33:33Z" level=debug msg="Resolving and reading test '/test/test.js'..."
time="2023-06-11T16:33:33Z" level=debug msg=Loading... moduleSpecifier="file:///test/test.js" originalModuleSpecifier=/test/test.js
time="2023-06-11T16:33:33Z" level=debug msg="'/test/test.js' resolved to 'file:///test/test.js' and successfully loaded 366399 bytes!"
time="2023-06-11T16:33:33Z" level=debug msg="Gathering k6 runtime options..."
time="2023-06-11T16:33:33Z" level=debug msg="Initializing k6 runner for '/test/test.js' (file:///test/test.js)..."
time="2023-06-11T16:33:33Z" level=debug msg="Detecting test type for..." test_path="file:///test/test.js"
time="2023-06-11T16:33:33Z" level=debug msg="Trying to load as a JS test..." test_path="file:///test/test.js"
time="2023-06-11T16:33:34Z" level=warning msg="The source for `file:///test/test.js` needs to go through babel but is over 256000 bytes. For performance reasons source map support will be disabled for this particular file."
time="2023-06-11T16:34:01Z" level=debug msg="Babel: Transformed" t=26.954266931s

Here is log (cropped) from successful run:

time="2023-06-12T15:56:50Z" level=debug msg="Logger format: TEXT"
time="2023-06-12T15:56:50Z" level=debug msg="k6 version: v0.44.0 (2023-04-24T10:35:38+0000/v0.44.0-0-g14d80f6f, go1.20.3, linux/amd64)"
time="2023-06-12T15:56:50Z" level=debug msg="Resolving and reading test '/test/test.js'..."
time="2023-06-12T15:56:50Z" level=debug msg=Loading... moduleSpecifier="file:///test/test.js" originalModuleSpecifier=/test/test.js
time="2023-06-12T15:56:50Z" level=debug msg="'/test/test.js' resolved to 'file:///test/test.js' and successfully loaded 366399 bytes!"
time="2023-06-12T15:56:50Z" level=debug msg="Gathering k6 runtime options..."
time="2023-06-12T15:56:50Z" level=debug msg="Initializing k6 runner for '/test/test.js' (file:///test/test.js)..."
time="2023-06-12T15:56:50Z" level=debug msg="Detecting test type for..." test_path="file:///test/test.js"
time="2023-06-12T15:56:50Z" level=debug msg="Trying to load as a JS test..." test_path="file:///test/test.js"
time="2023-06-12T15:56:51Z" level=warning msg="The source for `file:///test/test.js` needs to go through babel but is over 256000 bytes. For performance reasons source map support will be disabled for this particular file."
time="2023-06-12T15:57:18Z" level=debug msg="Babel: Transformed" t=26.921130013s
time="2023-06-12T15:57:18Z" level=debug msg="Runner successfully initialized!"
time="2023-06-12T15:57:18Z" level=debug msg="Parsing CLI flags..."
time="2023-06-12T15:57:18Z" level=debug msg="Consolidating config layers..."
...
time="2023-06-12T16:01:52Z" level=debug msg="Usage report sent successfully"
time="2023-06-12T16:01:52Z" level=debug msg="Everything has finished, exiting k6 normally!"

Also, I can provide you log from successful run of 800 kb script from Windows:

time="2023-06-09T05:45:18Z" level=debug msg="Logger format: TEXT"
time="2023-06-09T05:45:18Z" level=debug msg="k6 version: v0.43.1 (2023-02-27T10:53:03+0000/v0.43.1-0-gaf3a0b89, go1.19.6, windows/amd64)"

          /\      |‾‾| /‾‾/   /‾‾/   
     /\  /  \     |  |/  /   /  /    
    /  \/    \    |     (   /   ‾‾\  
   /          \   |  |\  \ |  (‾)  | 
  / __________ \  |__| \__\ \_____/ .io

time="2023-06-09T05:45:18Z" level=debug msg="Resolving and reading test 'student-production-over-500.js'..."
time="2023-06-09T05:45:18Z" level=debug msg=Loading... moduleSpecifier="file:///c:/AgileWork/K6%20Scripts/student-production-over-500.js" originalModuleSpecifier=student-production-over-500.js
time="2023-06-09T05:45:18Z" level=debug msg="'student-production-over-500.js' resolved to 'file:///c:/AgileWork/K6%20Scripts/student-production-over-500.js' and successfully loaded 589622 bytes!"
time="2023-06-09T05:45:18Z" level=debug msg="Gathering k6 runtime options..."
time="2023-06-09T05:45:18Z" level=debug msg="Initializing k6 runner for 'student-production-over-500.js' (file:///c:/AgileWork/K6%20Scripts/student-production-over-500.js)..."
time="2023-06-09T05:45:18Z" level=debug msg="Detecting test type for..." test_path="file:///c:/AgileWork/K6%20Scripts/student-production-over-500.js"
time="2023-06-09T05:45:18Z" level=debug msg="Trying to load as a JS test..." test_path="file:///c:/AgileWork/K6%20Scripts/student-production-over-500.js"
time="2023-06-09T05:45:18Z" level=warning msg="The source for `file:///c:/AgileWork/K6%20Scripts/student-production-over-500.js` needs to go through babel but is over 256000 bytes. For performance reasons source map support will be disabled for this particular file."
time="2023-06-09T05:45:28Z" level=debug msg="Babel: Transformed" t=10.3230243s
time="2023-06-09T05:45:28Z" level=debug msg="Runner successfully initialized!"
time="2023-06-09T05:45:28Z" level=debug msg="Parsing CLI flags..."
time="2023-06-09T05:45:28Z" level=debug msg="Consolidating config layers..."
...
running (26m08.4s), 0/4 VUs, 0 complete and 3 interrupted iterations
contacts ✓ [ 100% ] 3/4 VUs  4m0s
time="2023-06-09T06:11:37Z" level=debug msg="Everything has finished, exiting k6 normally!"

Thanks! Going by these logs, it seems that in case of large script, Babel transformation on Kubernetes node is simply taking too long (more than 5 minutes?) and HTTP server doesn’t even manage to start. It’s also very curious that the transformation is so fast on Windows :smile:

I don’t have a script to test this at the moment, but it makes sense to try get rid off failing readiness / liveness probes and see if that helps. You can do that by passing something like this to yaml spec:

  runner:
    livenessProbe:
      httpGet:
        path: /v1/status
        port: 6565
      initialDelaySeconds: 60
      periodSeconds: 10
    readinessProbe:
      httpGet:
        path: /v1/status
        port: 6565
      initialDelaySeconds: 60
      periodSeconds: 10

(Liveness probe may be enough but I’m posting both as an example)
These options accept any “standard” parameter, e.g. failureThreshold. Hope that helps!

1 Like

@olha operator just ignores these parameters.

I’m adding probes to runner spec:

apiVersion: k6.io/v1alpha1
kind: K6
metadata:
  name: student-production-6
  namespace: qa-hot
spec:
  arguments: --out experimental-prometheus-rw --tag testid=20230620-3 --verbose --log-format json
  parallelism: 1
  runner:
    livenessProbe:
      httpGet:
        path: /v1/status
        port: 6565
      initialDelaySeconds: 180
      periodSeconds: 30
      failureThreshold: 6
      successThreshold: 1
    readinessProbe:
      httpGet:
        path: /v1/status
        port: 6565
      initialDelaySeconds: 180
      periodSeconds: 30
      failureThreshold: 6
      successThreshold: 1
    env:
      - name: K6_PROMETHEUS_RW_SERVER_URL
        value: http://prom-2-44.default:9090/api/v1/write
      - name: K6_PROMETHEUS_RW_TREND_AS_NATIVE_HISTOGRAM
        value: 'true'
      - name: K6_PROMETHEUS_RW_TREND_STATS
        value: p(90),p(95),p(99),max
  script:
    configMap:
      file: test.js
      name: student-production-2

operator start working, but just exclude these lines from yaml:

so job is starting working with default healthchecks and falls as usual.

Perhaps there are some maximum values of these parameters?

They should be there… Both in K6 spec and in pods definition. @dlight, these probes were added just recently - could it be that’s its not the latest version of k6-operator? Sometimes cache does that trick of using older image when latest tag is used.

Hi @dlight, I’ve just now noticed that this change in k6-operator that allows to pass liveness & readiness probes has no been included in latest :sweat_smile: So it was part of 63bc8a336 commit and it was available only since that image here. In other words, before today, you needed to run the operator with that image or a later one in order to be able to specify liveness & readiness probes.

Not anymore: just now I’ve built a 0.0.10rc2 with new latest so now this functionality is available as latest or controller-v0.0.10rc2 tag. I suggest to pull this newest image.

Sorry for the misinformation! Hope that helps.

1 Like

Hi @olha, so what we should update? Operator itself or only pull latest ghcr.io/grafana/operator:latest-runner?
If we need to update operator - I don’t see any update instructions in docs. Should I do make delete, then git pull and make deploy again? Or there is something like make update for this purpose?

@dlight, just update the operator itself. You can do it with:

make delete
IMG=ghcr.io/grafana/operator:controller-v0.0.10rc2 make deploy
1 Like