Loki StatefulSet Scheduling Issues on EKS with Multi-AZ EBS Volumes

We recently deployed Loki in an EKS management cluster and ran into scheduling issues related to EBS volume zone affinity and Loki ingester topology. Posting the details here in case it helps others running Loki on Kubernetes with AWS EBS.

Environment

  • Kubernetes: Amazon EKS

  • Storage: AWS EBS (gp2)

  • Deployment: Loki Helm Chart

  • Cluster: multi-AZ (us-west-2a / us-west-2b / us-west-2c)

  • Loki components deployed in Kubernetes (distributed mode)

The cluster node groups span multiple AZs, which is standard for EKS.

Issue:

Loki ingesters use persistent volumes. On AWS, EBS volumes are single-AZ resources.

When a PVC is created, the corresponding PV is provisioned in a specific AZ.

Example PV:

labels:
  topology.kubernetes.io/zone: us-west-2b

During node drains (for example, during EKS upgrades or autoscaling), pods were rescheduled onto nodes in other AZs.

Example scheduler error:

0/8 nodes are available:
4 node(s) didn't match PersistentVolume's node affinity

The pod could not start because the PV existed in a different availability zone.

Why does this happen?

Loki’s zone aware replication uses logical zones (zone-a, zone-b, zone-c), but these do not automatically map to AWS availability zones.

As a result:

loki-ingester-zone-a pod
→ may schedule on any node
→ PVC may be created in a different AWS AZ

Later during rescheduling, Kubernetes enforces:

PV zone must match node zone

which causes scheduling failures.


Additional Scheduling Constraints

Ingester pods also use pod anti-affinity:

only one ingester per node

Combined with:

  • node drains

  • autoscaling

  • limited node capacity

this can lead to pods remaining Pending during cluster operations.

Example error:

2 node(s) didn't match pod anti-affinity rules
4 node(s) didn't match Pod's node affinity/selector

Solution:

We solved the issue by pinning Loki stateful components to a single AZ. And Loki recommends it-

Example Helm configuration:

ingester:
  nodeSelector:
    topology.kubernetes.io/zone: us-west-2b

This ensures:

Pod zone = Node zone = PV zone

which satisfies EBS volume constraints.

Do we have any better solution? If others are running Loki on EKS with EBS storage, it would be great to hear how you handle multi-AZ scheduling for ingesters.

For Mimir, it works fine using-

ZoneAwareReplication:

enabled: true

topologyKey: kubernetes.io/hostname

zones:

  - name: zone-a

nodeSelector:

topology.kubernetes.io/zone: us-west-2a

  - name: zone-b

nodeSelector:

topology.kubernetes.io/zone: us-west-2b

  - name: zone-c

nodeSelector:

topology.kubernetes.io/zone: us-west-2c

Are you using EBS CSI driver?

Yes, we have the EBS CSI driver. We also have Mimir and multiple zones that work fine. e.g.
The backend is S3, but we have kept EBS is needed for local storage.

zoneAwareReplication:

    enabled: true

topologyKey: kubernetes.io/hostname

zones:

      - name: zone-a

nodeSelector:

topology.kubernetes.io/zone: us-west-2a

      - name: zone-b

nodeSelector:

topology.kubernetes.io/zone: us-west-2b
  - name: zone-c

nodeSelector:

topology.kubernetes.io/zone: us-west-2c

How about scaler?

We don’t run Loki on EKS, so I can’t test for you, but we have other workloads on EKS with stateful EBS volumes and karpenter and I’ve never seen this on our cluster.

Yes, we have autoscalers installed, but they are not effective because Loki is unable to map AWS EBS volume regions to the corresponding PVCs.
And we can’t move Loki out of EKS :slightly_frowning_face: