Affinity issue when using nodeselector + parallelism + separate

I am running into a weird issue with node affinity separate flag, and parallelism.

Whenever I set the nodeselector value for my pods the pods are stuck in a pending state claiming that the node affinity is not being met…

If set the parallelism value to 1 with separate flag set to true + nodeselector everything works fine.
If set the parallelism value to more than 1 with separate flag set to false + nodeselector everything works fine.
If set the parallelism value to more than 1 with separate flag set to true without a nodeselector everything works fine.

However, if set the parallelism value to more than 1 with separate flag set to true + nodeselector the starter never executes and all the pods are stuck in pending.

  initializer:
    nodeselector:
      node: k6
  starter:
    nodeselector:
      node: k6
  runner:
    nodeselector:
      node: k6

Here is the one of the pods defs

apiVersion: v1
kind: Pod
metadata:
  annotations:
    cloud.google.com/cluster_autoscaler_unhelpable_since: 2023-08-12T15:01:06+0000
    cloud.google.com/cluster_autoscaler_unhelpable_until: Inf
  creationTimestamp: "2023-08-12T15:01:03Z"
  finalizers:
  - batch.kubernetes.io/job-tracking
  generateName: timestamp-2-
  labels:
    app: k6
    batch.kubernetes.io/controller-uid: 99b6f5e8-35a5-49b7-b842-9853d67e2061
    batch.kubernetes.io/job-name: timestamp-2
    controller-uid: 99b6f5e8-35a5-49b7-b842-9853d67e2061
    job-name: timestamp-2
    k6_cr: timestamp
    runner: "true"
  name: timestamp-2-9c5zf
  namespace: k6
  ownerReferences:
  - apiVersion: batch/v1
    blockOwnerDeletion: true
    controller: true
    kind: Job
    name: timestamp-2
    uid: 99b6f5e8-35a5-49b7-b842-9853d67e2061
  resourceVersion: "141199"
  uid: 6cfe5a60-de09-4d3b-b2bf-6800c45c3eb0
spec:
  affinity:
    podAntiAffinity:
      requiredDuringSchedulingIgnoredDuringExecution:
      - labelSelector:
          matchExpressions:
          - key: app
            operator: In
            values:
            - k6
          - key: runner
            operator: In
            values:
            - "true"
        topologyKey: kubernetes.io/hostname
  automountServiceAccountToken: true
  containers:
  - command:
    - k6
    - run
    - --execution-segment=1/4:2/4
    - --execution-segment-sequence=0,1/4,2/4,3/4,1
    - --out
    - experimental-prometheus-rw
    - --tag
    - testid=tyk-timestamp-keyless-6aUq31pW8f
    - /test/timestamp.js
    - --address=0.0.0.0:6565
    - --paused
    - --tag
    - instance_id=2
    - --tag
    - job_name=timestamp-2
    env:
    - name: K6_PROMETHEUS_RW_SERVER_URL
      value: http://prometheus-server.dependencies.svc:80/api/v1/write
    - name: K6_PROMETHEUS_RW_PUSH_INTERVAL
      value: 1s
    - name: K6_PROMETHEUS_RW_TREND_AS_NATIVE_HISTOGRAM
      value: "true"
    image: ghcr.io/grafana/operator:latest-runner
    imagePullPolicy: IfNotPresent
    livenessProbe:
      failureThreshold: 3
      httpGet:
        path: /v1/status
        port: 6565
        scheme: HTTP
      periodSeconds: 10
      successThreshold: 1
      timeoutSeconds: 1
    name: k6
    ports:
    - containerPort: 6565
      protocol: TCP
    readinessProbe:
      failureThreshold: 3
      httpGet:
        path: /v1/status
        port: 6565
        scheme: HTTP
      periodSeconds: 10
      successThreshold: 1
      timeoutSeconds: 1
    resources: {}
    terminationMessagePath: /dev/termination-log
    terminationMessagePolicy: File
    volumeMounts:
    - mountPath: /test
      name: k6-test-volume
    - mountPath: /var/run/secrets/kubernetes.io/serviceaccount
      name: kube-api-access-dqmj2
      readOnly: true
  dnsPolicy: ClusterFirst
  enableServiceLinks: true
  hostname: timestamp-2
  nodeSelector:
    node: k6
  preemptionPolicy: PreemptLowerPriority
  priority: 0
  restartPolicy: Never
  schedulerName: default-scheduler
  securityContext: {}
  serviceAccount: default
  serviceAccountName: default
  terminationGracePeriodSeconds: 0
  tolerations:
  - effect: NoExecute
    key: node.kubernetes.io/not-ready
    operator: Exists
    tolerationSeconds: 300
  - effect: NoExecute
    key: node.kubernetes.io/unreachable
    operator: Exists
    tolerationSeconds: 300
  volumes:
  - configMap:
      defaultMode: 420
      name: timestamp-tyk-configmap
    name: k6-test-volume
  - name: kube-api-access-dqmj2
    projected:
      defaultMode: 420
      sources:
      - serviceAccountToken:
          expirationSeconds: 3607
          path: token
      - configMap:
          items:
          - key: ca.crt
            path: ca.crt
          name: kube-root-ca.crt
      - downwardAPI:
          items:
          - fieldRef:
              apiVersion: v1
              fieldPath: metadata.namespace
            path: namespace
status:
  conditions:
  - lastProbeTime: null
    lastTransitionTime: "2023-08-12T15:01:03Z"
    message: '0/5 nodes are available: 1 node(s) didn''t match pod anti-affinity rules,
      4 node(s) didn''t match Pod''s node affinity/selector. preemption: 0/5 nodes
      are available: 1 No preemption victims found for incoming pod, 4 Preemption
      is not helpful for scheduling..'
    reason: Unschedulable
    status: "False"
    type: PodScheduled
  phase: Pending
  qosClass: BestEffort

Any ideas?

I guess this is expected based on the separate flag implementation.

I have not really looking into the code but if I schedule two jobs with different nodeselectors you can probably break the separate logic.

Hi @zalbiraw, welcome to the forum :wave:

I’m a bit confused which use case exactly you’re trying to set up? It sounds to me like you ought to be using affinity instead of separate.

separate in k6-operator is meant as a quick shortcut. And yes, it’s implemented via affinity rules. It’ll be sufficient for some cases but not all because it is hard-coded. But you can pass affinity to the pod yourself if the requirements are complex enough.

Hello @olhayevtushenko thank you for the reply.

I wanted to use the separate flag to ensure that the k6 CRDs I created don’t all run at the same time. I also did not want those jobs to run outside of the specific node I created for these jobs.

I think there is an issue with the feature implementation because if I am running an x-node cluster and I wanted to set the parallelism value to a value higher than x then I wont be able to because the anti affinity rules will prevent me from creating multiple pods on the same node.

I think this is worth a GitHub issue, no?

It sounds like a case for affinity option and specifically preferredDuringSchedulingIgnoredDuringExecution (Kubernetes docs)

In K6 spec, it’d be something like:

runner:
  nodeselector:
    node: k6
  affinity:
      podAntiAffinity:
        preferredDuringSchedulingIgnoredDuringExecution:
        - weight: 100
          podAffinityTerm:
            topologyKey: kubernetes.io/hostname
            labelSelector:
              matchExpressions:
              - key: runner
                operator: In
                values:
                - true

Could you please try that kind of configuration?

This will not exactly solve my case because because it will now allow all the CRDs to execute.

The solution would be similar however, I can set the separate flag to flase and add this anti affinity rule to the starter pod.

Thank you for your help!

If anyone comes across this here is the solution.

Thanks again for the help @olhayevtushenko

  initializer:
    metadata:
      labels:
        initializer: "k6"
    nodeselector:
      node: k6
    affinity:
      podAntiAffinity:
        requiredDuringSchedulingIgnoredDuringExecution:
        - topologyKey: kubernetes.io/hostname
          labelSelector:
            matchExpressions:
            - key: initializer
              operator: In
              values:
              - "k6"
        - topologyKey: kubernetes.io/hostname
          labelSelector:
            matchExpressions:
            - key: runner
              operator: In
              values:
              - "true"
  starter:
    nodeselector:
      node: k6
  runner:
    nodeselector:
      node: k6
1 Like

Good to know the issue is resolved. Thanks for sharing your final solution @zalbiraw!