Running each pod on a separate EKS node (with node groups)

elguaposalsero · March 8, 2023, 3:51am

Hey everyone,

I’m using K6 operator and I’m trying to run a distributed load test on EKS with AWS auto-scaling node groups.

I’ve setup cluster autoscaler and it does autoscale my nodes if I set something like this in my K6 CRD:

resources:
      limits:
        cpu: 600m
        memory: 1Gi
      requests:
        cpu: 100m
        memory: 1Gi

However, my goal is to have every pod during my load test be run on a separate node (which means I’d like my node group to autoscale to the size of parallelism in the CRD).

I tried doing this with anti-affinity groups and couldn’t get it to work.

I then tried using separate: true as outlined in the operator documentation and I’m getting some strange behaviour.

For example, if I have parallelism set to 10, and I have a node group on EKS with a minimum of 2 and a maximum of 10, when I set separate:true it will create one additional node and give me 3 nodes. And all the other pods will remain in a “pending” state.

If I try to cancel and run it again, the same will happen. This time I’ll get one extra node which will give me a total of 4 nodes and all the other pods will remain in a pending state.

Any idea why this is happening. Would appreciate any help.

Here’s my CRD file:


apiVersion: k6.io/v1alpha1
kind: K6
metadata:
  name: k6-sample
  labels:
    app: load
spec:
  parallelism: 10
  script:
    configMap:
      name: "crocodile-stress-test"
      file: "test.js"
  separate: true
  arguments: --out statsd
  runner:
    metadata:
      labels:
        app: load
    # resources:
    #   limits:
    #     cpu: 600m
    #     memory: 1Gi
    #   requests:
    #     cpu: 100m
    #     memory: 1Gi
    env:
      - name: K6_STATSD_ADDR
        value: "statsd-service:8125"
    # affinity:
    #   podAntiAffinity:
    #     requiredDuringSchedulingIgnoredDuringExecution:
    #       - labelSelector:
    #           matchExpressions:
    #             - key: app
    #               operator: In
    #               values:
    #                 - load
    #         topologyKey: kubernetes.io/hostname

olhayevtushenko · March 8, 2023, 8:31am

Hi @elguaposalsero,
Welcome to the forum

It sounds like there’s an issue with EKS or autoscaler setup. separate: true should have been enough to allocate additional nodes in the scenario you described. I’d recommend to try to find out if cluster-autoscaler is healthy and what reason exactly is given for FailedScheduling:

github.com

kubernetes/autoscaler/blob/master/cluster-autoscaler/FAQ.md#i-have-a-couple-of-pending-pods-but-there-was-no-scale-up

<!--TODO: Remove "previously referred to as master" references from this doc once this terminology is fully removed from k8s-->
# Frequently Asked Questions

# Older versions
The answers in this FAQ apply to the newest (HEAD) version of Cluster Autoscaler. If
you're using an older version of CA please refer to corresponding version of
this document:

* [Cluster Autoscaler 0.5.X](https://github.com/kubernetes/autoscaler/blob/cluster-autoscaler-release-0.5/cluster-autoscaler/FAQ.md)
* [Cluster Autoscaler 0.6.X](https://github.com/kubernetes/autoscaler/blob/cluster-autoscaler-release-0.6/cluster-autoscaler/FAQ.md)

# Table of Contents:
<!--- TOC BEGIN -->
* [Basics](#basics)
  * [What is Cluster Autoscaler?](#what-is-cluster-autoscaler)
  * [When does Cluster Autoscaler change the size of a cluster?](#when-does-cluster-autoscaler-change-the-size-of-a-cluster)
  * [What types of pods can prevent CA from removing a node?](#what-types-of-pods-can-prevent-ca-from-removing-a-node)
  * [Which version on Cluster Autoscaler should I use in my cluster?](#which-version-on-cluster-autoscaler-should-i-use-in-my-cluster)
  * [Is Cluster Autoscaler an Alpha, Beta or GA product?](#is-cluster-autoscaler-an-alpha-beta-or-ga-product)
  * [What are the Service Level Objectives for Cluster Autoscaler?](#what-are-the-service-level-objectives-for-cluster-autoscaler)

This file has been truncated. show original

Checking if there’s are any known issues related to specific versions of EKS and cluster-autoscaler might also help.

Hope that helps!

Topic		Replies	Views
Autoscale feature on k6-operator k6-operator	1	445	June 21, 2023
Affinity issue when using nodeselector + parallelism + separate k6-operator	7	1492	August 16, 2023
K6 installation on OpenShift - PODs error OSS Support	3	561	May 29, 2023
K6 pods not getting created on k8-cluster OSS Support	1	998	July 6, 2022
K6-Operator Test Cleanup k6-operator	2	681	July 5, 2023

Running each pod on a separate EKS node (with node groups)

Related topics