Loki StatefulSet Scheduling Issues on EKS with Multi-AZ EBS Volumes

ffpjha · March 5, 2026, 6:15am

We recently deployed Loki in an EKS management cluster and ran into scheduling issues related to EBS volume zone affinity and Loki ingester topology. Posting the details here in case it helps others running Loki on Kubernetes with AWS EBS.

Environment

Kubernetes: Amazon EKS
Storage: AWS EBS (gp2)
Deployment: Loki Helm Chart
Cluster: multi-AZ (us-west-2a / us-west-2b / us-west-2c)
Loki components deployed in Kubernetes (distributed mode)

The cluster node groups span multiple AZs, which is standard for EKS.

Issue:

Loki ingesters use persistent volumes. On AWS, EBS volumes are single-AZ resources.

When a PVC is created, the corresponding PV is provisioned in a specific AZ.

Example PV:

labels:
  topology.kubernetes.io/zone: us-west-2b

During node drains (for example, during EKS upgrades or autoscaling), pods were rescheduled onto nodes in other AZs.

Example scheduler error:

0/8 nodes are available:
4 node(s) didn't match PersistentVolume's node affinity

The pod could not start because the PV existed in a different availability zone.

Why does this happen?

Loki’s zone aware replication uses logical zones (zone-a, zone-b, zone-c), but these do not automatically map to AWS availability zones.

As a result:

loki-ingester-zone-a pod
→ may schedule on any node
→ PVC may be created in a different AWS AZ

Later during rescheduling, Kubernetes enforces:

PV zone must match node zone

which causes scheduling failures.

Additional Scheduling Constraints

Ingester pods also use pod anti-affinity:

only one ingester per node

Combined with:

node drains
autoscaling
limited node capacity

this can lead to pods remaining Pending during cluster operations.

Example error:

2 node(s) didn't match pod anti-affinity rules
4 node(s) didn't match Pod's node affinity/selector

Solution:

We solved the issue by pinning Loki stateful components to a single AZ. And Loki recommends it-

github.com/grafana/loki

production/helm/loki/values.yaml

main

# -- Overrides the version used to determine compatibility of resources with the target Kubernetes cluster.
# This is useful when using `helm template`, because then helm will use the client version of kubectl as the Kubernetes version,
# which may or may not match your cluster's server version. Example: 'v1.24.4'. Set to null to use the version that helm
# devises.
kubeVersionOverride: null

global:
  # -- Overrides the Docker registry globally for all images (standard format)
  imageRegistry: null
  image:
    # -- Overrides the Docker registry globally for all images (deprecated, use global.imageRegistry)
    registry: null
  # -- Overrides the priorityClassName for all pods
  priorityClassName: null
  # -- configures cluster domain ("cluster.local" by default)
  clusterDomain: "cluster.local"
  # -- configures DNS service name
  dnsService: "kube-dns"
  # -- configures DNS service namespace
  dnsNamespace: "kube-system"

This file has been truncated. show original

Example Helm configuration:

ingester:
  nodeSelector:
    topology.kubernetes.io/zone: us-west-2b

This ensures:

Pod zone = Node zone = PV zone

which satisfies EBS volume constraints.

Do we have any better solution? If others are running Loki on EKS with EBS storage, it would be great to hear how you handle multi-AZ scheduling for ingesters.

For Mimir, it works fine using-

ZoneAwareReplication:

enabled: true

topologyKey: kubernetes.io/hostname

zones:

  - name: zone-a

nodeSelector:

topology.kubernetes.io/zone: us-west-2a

  - name: zone-b

nodeSelector:

topology.kubernetes.io/zone: us-west-2b

  - name: zone-c

nodeSelector:

topology.kubernetes.io/zone: us-west-2c

tonyswumac · March 5, 2026, 8:48pm

Are you using EBS CSI driver?

ffpjha · March 11, 2026, 11:02am

Yes, we have the EBS CSI driver. We also have Mimir and multiple zones that work fine. e.g.
The backend is S3, but we have kept EBS is needed for local storage.

zoneAwareReplication:

    enabled: true

topologyKey: kubernetes.io/hostname

zones:

      - name: zone-a

nodeSelector:

topology.kubernetes.io/zone: us-west-2a

      - name: zone-b

nodeSelector:

topology.kubernetes.io/zone: us-west-2b

  - name: zone-c

nodeSelector:

topology.kubernetes.io/zone: us-west-2c

tonyswumac · March 12, 2026, 8:24pm

How about scaler?

We don’t run Loki on EKS, so I can’t test for you, but we have other workloads on EKS with stateful EBS volumes and karpenter and I’ve never seen this on our cluster.

ffpjha · March 23, 2026, 6:40am

Yes, we have autoscalers installed, but they are not effective because Loki is unable to map AWS EBS volume regions to the corresponding PVCs.
And we can’t move Loki out of EKS

Topic		Replies	Views
Highly available Loki on EKS (which is spread across 2 AZs) Grafana Loki loki	3	412	November 8, 2023
Grafana Loki Helm Installation, Pending Pods with: 0/2 nodes are available: 1 Too many pods, 1 node(s) had untolerated taint Grafana Loki kubernetes , helm	1	502	November 1, 2024
How to install Loki on (AWS) EKS using Terraform with S3 Grafana Loki loki , terraform	5	1175	November 20, 2024
Is EFS a good logs backup option if Loki pod terminated accidentally in EKS Fargate Grafana Loki aws	0	924	September 26, 2022
Not able to install grafana loki on eks Grafana Loki	3	1565	October 21, 2023