Slow queries and cannot access old logs - Loki 3 migration

sarasensible · May 27, 2025, 11:24pm

I am attempting to use this guide - Migrate from `loki-distributed` Helm chart | Grafana Loki documentation - to migrate from loki-distributed to the new Loki 3 Helm chart. I am also migrating from boltdb shipper to the tsdb.

I configured the new loki to join the old loki’s memberlist and it can see logs in the cluster. However the new Grafana datasource that uses the new loki is having 5 second query times for 7 days worth of logs - kilobytes of data. The old loki takes less than a second for the same query. Additionally the new loki does not see the old logs. What could I be doing wrong?

Loki config:

migrate:
  fromDistributed:
    enabled: true
    memberlistService: loki-loki-distributed-memberlist

nameOverride: loki-next
fullnameOverride: loki-next

loki:
  containerSecurityContext:
    readOnlyRootFilesystem: false
  auth_enabled: false
  schemaConfig:
    configs:
    - from: 2024-07-15
      store: boltdb-shipper
      object_store: s3
      schema: v13
      index:
        prefix: index_
        period: 24h
    - from: "2025-05-28"
      store: tsdb
      object_store: s3
      schema: v13
      index:
        prefix: index_
        period: 24h
  storage_config:
    aws:
      region: ${region}
      bucketnames: ${bucket_chunks}
      s3forcepathstyle: true
    boltdb_shipper:
      active_index_directory: /var/loki/index
      cache_location: /var/loki/boltdb-cache
      cache_ttl: 168h
      index_gateway_client:
        server_address: dns:///loki-next-index-gateway:9095
    tsdb_shipper:
      active_index_directory: /var/loki/tsdb-index
      cache_location: /var/loki/tsdb-cache
  storage:
   type: "s3"
   s3:
     region: ${region}
   bucketNames:
     chunks: ${bucket_chunks}
     ruler: ${bucket_ruler}
  ingester:
    chunk_encoding: snappy
  querier:
      max_concurrent: 16
      query_ingesters_within: 3h
  query_scheduler:
    max_outstanding_requests_per_tenant: 64000
  pattern_ingester:
    enabled: true
  limits_config:
    allow_structured_metadata: false
    split_queries_by_interval: 15m
    allow_deletes: true
    ingestion_burst_size_mb: 200
    ingestion_rate_mb: 100
    ingestion_rate_strategy: local
    max_cache_freshness_per_query: 10m
    max_global_streams_per_user: 10000
    max_query_length: 12000h
    max_query_parallelism: 16
    max_streams_per_user: 0
    per_stream_rate_limit: 1000MB
    per_stream_rate_limit_burst: 200MB
    query_timeout: 5m
    reject_old_samples: true
    reject_old_samples_max_age: 168h
    retention_period: 2880h
  compactor:
    working_directory: /var/loki/retention
    compaction_interval: 10m
    retention_delete_delay: 2h
    retention_delete_worker_count: 150
    retention_enabled: true
    delete_request_store: s3
  query_range:
    # make queries more cache-able by aligning them with their step intervals
    align_queries_with_step: true
    max_retries: 5
    cache_results: true

    results_cache:
      cache:
        enable_fifocache: true
        fifocache:
          max_size_items: 1024
          validity: 24h
  commonConfig:
    path_prefix: /var/loki
    replication_factor: 3
    compactor_address: "http://loki-next-compactor:3100"

serviceAccount:
  name: logs-manager
  annotations:
    "eks.amazonaws.com/role-arn": ${service_account_arn}

deploymentMode: Distributed

chunksCache:
  allocatedMemory: 11264
  batchSize: 6
  parallelism: 9
  maxItemMemory: 9
  nodeSelector:
    role: metricsmem
  tolerations:
    - key: role
      operator: Equal
      value: metricsmem
      effect: NoSchedule

queryScheduler:
  replicas: 2
  nodeSelector:
    role: metrics
  tolerations:
    - key: role
      operator: Equal
      value: metrics
      effect: NoSchedule
querier:
  nodeSelector:
    role: metrics
  tolerations:
    - key: role
      operator: Equal
      value: metrics
      effect: NoSchedule
  replicas: 2
  maxUnavailable: 1
  resources:
    requests:
      cpu: 2
      memory: 2Gi
    limits:
      cpu: 3
      memory: 4Gi
indexGateway:
  maxUnavailable: 2
  replicas: 3
  nodeSelector:
    role: metrics
  tolerations:
    - key: role
      operator: Equal
      value: metrics
      effect: NoSchedule
  resources:
    requests:
      cpu: 1
      memory: 1Gi
    limits:
      cpu: 2
      memory: 2Gi
compactor:
  replicas: 1
  nodeSelector:
    role: metrics
  tolerations:
    - key: role
      operator: Equal
      value: metrics
      effect: NoSchedule
ingester:
  resources:
    limits:
      cpu: "2"
      memory: 2Gi
    requests:
      cpu: "1"
      memory: 1Gi
  replicas: 3
  maxUnavailable: 1
  nodeSelector:
    role: metrics
  tolerations:
    - key: role
      operator: Equal
      value: metrics
      effect: NoSchedule
distributor:
  replicas: 2
  maxUnavailable: 1
  nodeSelector:
    role: metrics
  tolerations:
    - key: role
      operator: Equal
      value: metrics
      effect: NoSchedule
queryFrontend:
  replicas: 3
  maxUnavailable: 2
  resources:
    limits:
      cpu: "3"
      memory: 6Gi
    requests:
      cpu: "2"
      memory: 4Gi
  nodeSelector:
    role: metrics
  tolerations:
    - key: role
      operator: Equal
      value: metrics
      effect: NoSchedule
gateway:
  service:
    type: NodePort
    nodePort: 32438
  image:
    # -- The Docker registry for the gateway image
    registry: docker.io
    # -- The gateway image repository
    repository: nginxinc/nginx-unprivileged
    # -- The gateway image tag
    tag: 1.27.3-alpine
    # -- The gateway image pull policy
    pullPolicy: IfNotPresent
  nodeSelector:
    role: metrics
  tolerations:
    - key: role
      operator: Equal
      value: metrics
      effect: NoSchedule

bloomCompactor:
  replicas: 0
bloomGateway:
  replicas: 0

backend:
   replicas: 0
read:
   replicas: 0
write:
   replicas: 0

singleBinary:
   replicas: 0

minio:
  enabled: false

Thanks in advance!

sarasensible · May 28, 2025, 8:35pm

It seems like this has more to do with the migration from the boltdb shipper to the tsdb. I added the following configuration to the old loki-distributed chart and now it is also timing out on a 7 day query:

schemaConfig:
    configs:
      - from: "2020-05-15"
        store: boltdb-shipper
        object_store: s3
        schema: v11
        index:
          prefix: index_
          period: 24h
      - from: "2025-06-05"
        store: tsdb
        object_store: s3
        schema: v13
        index:
          prefix: index_
          period: 24h
storageConfig:
    aws:
      s3: ${bucket}
      s3forcepathstyle: true
      insecure: false
    boltdb_shipper:
      active_index_directory: /var/loki/index
      shared_store: s3
      cache_location: /var/loki/boltdb-cache
      cache_ttl: 168h
      index_gateway_client:
        server_address: dns:///loki-distributed-index-gateway:9095
    tsdb_shipper:
      active_index_directory: /var/loki/tsdb-index
      cache_location: /var/loki/tsdb-cache
      index_gateway_client:
        server_address: dns:///loki-distributed-index-gateway:9095

jayclifford · May 29, 2025, 9:19am

Hi @sarasensible, this is a tough one due to a lot of moving parts at one time. From what I can see, however is you have a lot of mismatching in your schema:

  schemaConfig:
    configs:
    - from: 2024-07-15
      store: boltdb-shipper
      object_store: s3
      schema: v13
      index:
        prefix: index_
        period: 24h
    - from: "2025-05-28"
      store: tsdb
      object_store: s3
      schema: v13
      index:
        prefix: index_
        period: 24h

then you change back to:

schemaConfig:
    configs:
      - from: "2020-05-15"
        store: boltdb-shipper
        object_store: s3
        schema: v11
        index:
          prefix: index_
          period: 24h
      - from: "2025-06-05"
        store: tsdb
        object_store: s3
        schema: v13
        index:
          prefix: index_
          period: 24h

The new schema_config needs to extend the old one.

Here is an example of a schema over time

schema_config:
  configs:
    - from: "2020-09-07"
      index:
        period: 24h
        prefix: loki_index_
      object_store: s3
      schema: v11
      store: boltdb-shipper
    - from: 2020-10-24
      store: boltdb-shipper
      object_store: s3
      schema: v11
      index:
        prefix: index_
        period: 24h
    - from: 2022-10-22
      store: boltdb-shipper
      object_store: s3
      schema: v12
      index:
        prefix: index_
        period: 24h
    - from: "2023-05-03"
      index:
        period: 24h
        prefix: tsdb_index_
      object_store: s3
      schema: v12
      store: tsdb

sarasensible · May 29, 2025, 1:40pm

Hi @jayclifford , thanks so much for your response!

Can you please explain why you start with the prefix “loki_index_”, go to an intermediate “index_” then end at “tsdb_index_”? What is that accomplishing, and why would I start with a prefix I have never used?

Also why wouldn’t you migrate from schema v11 straight to the newest v13? Why go to the intermediate v12?

Thank you!

sarasensible · May 30, 2025, 1:40pm

After doing a bit more digging, it seems to me that @jayclifford 's answer was meant to be illustrative rather than actually suggesting a new configuration.

The idea is to do a migration gradually and keep the old schema config around in order to preserve the ability to read the old logs.

As a result, I am going to use the following schema config when I go to prod:

schemaConfig:
    configs:
      - from: "2020-05-15"
        store: boltdb-shipper
        object_store: s3
        schema: v11
        index:
          prefix: index_
          period: 24h
      - from: "2025-06-05"
        store: tsdb
        object_store: s3
        schema: v11
        index:
          prefix: index_
          period: 24h
      - from: "2025-06-09"
        store: tsdb
        object_store: s3
        schema: v13
        index:
          prefix: index_
          period: 24h

This migrates first from boltdb-shipper to tsdb, preserving the same schema v11, then switches to schema v13 after it is already on the tsdb. I got rid of the different prefixes since I don’t think these accomplish anything.

I will update this posts with the results once I have switched over.

jayclifford · June 2, 2025, 8:51am

Hi @sarasensible,
My apologies, my last message was a little cryptic, but the idea was to maintain the schema version for your historic data. In the orginal Loki config you had sent in you had changed boltdb-shipper to schema:v13 which would have been invalid:

    - from: 2024-07-15
      store: boltdb-shipper
      object_store: s3
      schema: v13
      index:
        prefix: index_
        period: 24h
    - from: "2025-05-28"
      store: tsdb
      object_store: s3
      schema: v13
      index:
        prefix: index_
        period: 24h

sarasensible · June 18, 2025, 2:22pm

Once I understood how the schema needs to stay consistent in order to read old logs - meaning you can’t configure it with one schema and then change it later to another - this all went smoothly. I configured the schema to change over in the future as documented and kept the old schema the same so that I could read past logs. Loki upgrade, check!

Topic		Replies	Views
[LokiDistributed] Can not query store data after migration Grafana Loki	6	463	September 4, 2024
Cannot see logs after migration from loki-distributed to loki Grafana Loki	5	94	September 3, 2025
Problems to migrate Logs to new Loki instance Grafana Loki loki	5	773	April 21, 2025
Problems moving from Loki-Stack Chart to Loki chart 6.x Grafana Loki loki	5	449	July 26, 2025
Performance issues with grafana loki after upgraded to version 3.x Grafana Loki loki	7	411	August 5, 2025

Slow queries and cannot access old logs - Loki 3 migration

Related topics