Behavior when replacing Loki PVs or modifying replicas in simple scalable module

We’re running Loki in scalable mode with single store tsdb, and we’re looking to migrate to a new EKS cluster under the same AWS account:

  1. what’s the behavior if the Loki PVs (loki-read, loki-write, and loki-backend) are deleted, but the underlying S3 bucket remains the same? Will we lose any data?
  2. if we change the number of replicas for loki-read, loki-write, and loki-backend, do we need to take any additional steps to prevent loss of data?

Our Loki Helm chart config looks as follows:

loki:
  image:
    # -- The Docker registry
    registry: docker.io
    # -- Docker image repository
    repository: grafana/loki
    tag: 3.1.1
  auth_enabled: false
  storage:
    type: s3
    bucketNames:
      chunks: ${s3_bucket_name}
      ruler: ${s3_bucket_name}
      admin: ${s3_bucket_name}
    s3:
      region: ${region}
  pattern_ingester:
    enabled: true
  compactor:
    retention_enabled: true
    delete_request_store: "s3"
  schemaConfig:
    configs:
      - from: 2022-01-11
        index:
          period: 24h
          prefix: loki_ops_index_
        object_store: s3
        schema: v13
        store: tsdb
  1. I don’t believe you need PVs for loki-read.
  2. loki-write stores write-ahead logs on the PVs, so you’ll want to make sure to gracefully terminate your loki-write containers during migration by invoking /ingester/shutdown (see Loki HTTP API | Grafana Loki documentation).
  3. loki-backend stores delete marker files on PVs. If you lose this is not a huge deal, the compactor will just miss deleting some chunks.

My understanding is you can pretty freely scale up and down loki-read and loki-backend (if you scale loki-backend you do want to make sure your rulers form a ring membership or you’ll get multiple alerts). For loki-write it’s a bit tricky when scaling down, because you do want to make sure whatever WAL data that are in the PVs are flushed first. I haven’t had to do this (we simply don’t scale down loki-write ever), but you can check out the community helm chart code and i suspect this may be taken care of already.

Got it, thanks! What happens if we don’t run the shutdown? My guess is we just lose the logs that were in the process of being written, but old data left is fine (we want to automate this for many cluster so this is probably just easiest)?

If you don’t run shutdown you shouldn’t lose anything as long as the PVs are still around (WAL). If you intend to automate scaling of the writers I would recommend to at least have 2 replicas for ingesters.