Affinity issue when using nodeselector + parallelism + separate

zalbiraw · August 12, 2023, 3:12pm

I am running into a weird issue with node affinity separate flag, and parallelism.

Whenever I set the nodeselector value for my pods the pods are stuck in a pending state claiming that the node affinity is not being met…

If set the parallelism value to 1 with separate flag set to true + nodeselector everything works fine.
If set the parallelism value to more than 1 with separate flag set to false + nodeselector everything works fine.
If set the parallelism value to more than 1 with separate flag set to true without a nodeselector everything works fine.

However, if set the parallelism value to more than 1 with separate flag set to true + nodeselector the starter never executes and all the pods are stuck in pending.

  initializer:
    nodeselector:
      node: k6
  starter:
    nodeselector:
      node: k6
  runner:
    nodeselector:
      node: k6

Here is the one of the pods defs

apiVersion: v1
kind: Pod
metadata:
  annotations:
    cloud.google.com/cluster_autoscaler_unhelpable_since: 2023-08-12T15:01:06+0000
    cloud.google.com/cluster_autoscaler_unhelpable_until: Inf
  creationTimestamp: "2023-08-12T15:01:03Z"
  finalizers:
  - batch.kubernetes.io/job-tracking
  generateName: timestamp-2-
  labels:
    app: k6
    batch.kubernetes.io/controller-uid: 99b6f5e8-35a5-49b7-b842-9853d67e2061
    batch.kubernetes.io/job-name: timestamp-2
    controller-uid: 99b6f5e8-35a5-49b7-b842-9853d67e2061
    job-name: timestamp-2
    k6_cr: timestamp
    runner: "true"
  name: timestamp-2-9c5zf
  namespace: k6
  ownerReferences:
  - apiVersion: batch/v1
    blockOwnerDeletion: true
    controller: true
    kind: Job
    name: timestamp-2
    uid: 99b6f5e8-35a5-49b7-b842-9853d67e2061
  resourceVersion: "141199"
  uid: 6cfe5a60-de09-4d3b-b2bf-6800c45c3eb0
spec:
  affinity:
    podAntiAffinity:
      requiredDuringSchedulingIgnoredDuringExecution:
      - labelSelector:
          matchExpressions:
          - key: app
            operator: In
            values:
            - k6
          - key: runner
            operator: In
            values:
            - "true"
        topologyKey: kubernetes.io/hostname
  automountServiceAccountToken: true
  containers:
  - command:
    - k6
    - run
    - --execution-segment=1/4:2/4
    - --execution-segment-sequence=0,1/4,2/4,3/4,1
    - --out
    - experimental-prometheus-rw
    - --tag
    - testid=tyk-timestamp-keyless-6aUq31pW8f
    - /test/timestamp.js
    - --address=0.0.0.0:6565
    - --paused
    - --tag
    - instance_id=2
    - --tag
    - job_name=timestamp-2
    env:
    - name: K6_PROMETHEUS_RW_SERVER_URL
      value: http://prometheus-server.dependencies.svc:80/api/v1/write
    - name: K6_PROMETHEUS_RW_PUSH_INTERVAL
      value: 1s
    - name: K6_PROMETHEUS_RW_TREND_AS_NATIVE_HISTOGRAM
      value: "true"
    image: ghcr.io/grafana/operator:latest-runner
    imagePullPolicy: IfNotPresent
    livenessProbe:
      failureThreshold: 3
      httpGet:
        path: /v1/status
        port: 6565
        scheme: HTTP
      periodSeconds: 10
      successThreshold: 1
      timeoutSeconds: 1
    name: k6
    ports:
    - containerPort: 6565
      protocol: TCP
    readinessProbe:
      failureThreshold: 3
      httpGet:
        path: /v1/status
        port: 6565
        scheme: HTTP
      periodSeconds: 10
      successThreshold: 1
      timeoutSeconds: 1
    resources: {}
    terminationMessagePath: /dev/termination-log
    terminationMessagePolicy: File
    volumeMounts:
    - mountPath: /test
      name: k6-test-volume
    - mountPath: /var/run/secrets/kubernetes.io/serviceaccount
      name: kube-api-access-dqmj2
      readOnly: true
  dnsPolicy: ClusterFirst
  enableServiceLinks: true
  hostname: timestamp-2
  nodeSelector:
    node: k6
  preemptionPolicy: PreemptLowerPriority
  priority: 0
  restartPolicy: Never
  schedulerName: default-scheduler
  securityContext: {}
  serviceAccount: default
  serviceAccountName: default
  terminationGracePeriodSeconds: 0
  tolerations:
  - effect: NoExecute
    key: node.kubernetes.io/not-ready
    operator: Exists
    tolerationSeconds: 300
  - effect: NoExecute
    key: node.kubernetes.io/unreachable
    operator: Exists
    tolerationSeconds: 300
  volumes:
  - configMap:
      defaultMode: 420
      name: timestamp-tyk-configmap
    name: k6-test-volume
  - name: kube-api-access-dqmj2
    projected:
      defaultMode: 420
      sources:
      - serviceAccountToken:
          expirationSeconds: 3607
          path: token
      - configMap:
          items:
          - key: ca.crt
            path: ca.crt
          name: kube-root-ca.crt
      - downwardAPI:
          items:
          - fieldRef:
              apiVersion: v1
              fieldPath: metadata.namespace
            path: namespace
status:
  conditions:
  - lastProbeTime: null
    lastTransitionTime: "2023-08-12T15:01:03Z"
    message: '0/5 nodes are available: 1 node(s) didn''t match pod anti-affinity rules,
      4 node(s) didn''t match Pod''s node affinity/selector. preemption: 0/5 nodes
      are available: 1 No preemption victims found for incoming pod, 4 Preemption
      is not helpful for scheduling..'
    reason: Unschedulable
    status: "False"
    type: PodScheduled
  phase: Pending
  qosClass: BestEffort

Any ideas?

zalbiraw · August 12, 2023, 10:18pm

I guess this is expected based on the separate flag implementation.

github.com

grafana/k6-operator/blob/main/pkg/resources/jobs/runner.go#L237


      
          			Ports: port,
          			Selector: map[string]string{
          				"job-name": runnerName,
          			},
          		},
          	}
          
          	return service, nil
          }
          
          func newAntiAffinity() *corev1.Affinity {
          	return &corev1.Affinity{
          		PodAntiAffinity: &corev1.PodAntiAffinity{
          			RequiredDuringSchedulingIgnoredDuringExecution: []corev1.PodAffinityTerm{
          				{
          					LabelSelector: &metav1.LabelSelector{
          						MatchExpressions: []metav1.LabelSelectorRequirement{
          							{
          								Key:      "app",
          								Operator: "In",
          								Values: []string{

I have not really looking into the code but if I schedule two jobs with different nodeselectors you can probably break the separate logic.

olhayevtushenko · August 14, 2023, 1:11pm

Hi @zalbiraw, welcome to the forum

I’m a bit confused which use case exactly you’re trying to set up? It sounds to me like you ought to be using affinity instead of separate.

separate in k6-operator is meant as a quick shortcut. And yes, it’s implemented via affinity rules. It’ll be sufficient for some cases but not all because it is hard-coded. But you can pass affinity to the pod yourself if the requirements are complex enough.

zalbiraw · August 14, 2023, 2:15pm

Hello @olhayevtushenko thank you for the reply.

I wanted to use the separate flag to ensure that the k6 CRDs I created don’t all run at the same time. I also did not want those jobs to run outside of the specific node I created for these jobs.

I think there is an issue with the feature implementation because if I am running an x-node cluster and I wanted to set the parallelism value to a value higher than x then I wont be able to because the anti affinity rules will prevent me from creating multiple pods on the same node.

I think this is worth a GitHub issue, no?

olhayevtushenko · August 15, 2023, 7:14am

It sounds like a case for affinity option and specifically preferredDuringSchedulingIgnoredDuringExecution (Kubernetes docs)

In K6 spec, it’d be something like:

runner:
  nodeselector:
    node: k6
  affinity:
      podAntiAffinity:
        preferredDuringSchedulingIgnoredDuringExecution:
        - weight: 100
          podAffinityTerm:
            topologyKey: kubernetes.io/hostname
            labelSelector:
              matchExpressions:
              - key: runner
                operator: In
                values:
                - true

Could you please try that kind of configuration?

zalbiraw · August 15, 2023, 3:47pm

This will not exactly solve my case because because it will now allow all the CRDs to execute.

The solution would be similar however, I can set the separate flag to flase and add this anti affinity rule to the starter pod.

Thank you for your help!

zalbiraw · August 15, 2023, 5:03pm

If anyone comes across this here is the solution.

Thanks again for the help @olhayevtushenko

  initializer:
    metadata:
      labels:
        initializer: "k6"
    nodeselector:
      node: k6
    affinity:
      podAntiAffinity:
        requiredDuringSchedulingIgnoredDuringExecution:
        - topologyKey: kubernetes.io/hostname
          labelSelector:
            matchExpressions:
            - key: initializer
              operator: In
              values:
              - "k6"
        - topologyKey: kubernetes.io/hostname
          labelSelector:
            matchExpressions:
            - key: runner
              operator: In
              values:
              - "true"
  starter:
    nodeselector:
      node: k6
  runner:
    nodeselector:
      node: k6

olhayevtushenko · August 16, 2023, 8:07am

Good to know the issue is resolved. Thanks for sharing your final solution @zalbiraw!

Topic		Replies	Views
Test nor running on specifying nodeSelector k6-operator	1	66	December 10, 2024
"nodeSelector" not working in K6 Operator k6-operator	3	1241	March 17, 2023
Running each pod on a separate EKS node (with node groups) k6-operator	1	741	March 8, 2023
K6 pods not getting created on k8-cluster OSS Support	1	1003	July 6, 2022
K6-operator with Karpenter: Best practices and recommended configurations k6-operator k6	1	33	July 10, 2025

Affinity issue when using nodeselector + parallelism + separate

Related topics