Loki writer error migration from 5.14.1 to 6.27.0

smb2 · March 3, 2025, 8:25pm

I am migration Loki from the version 5.14.1 to 6.27.0, I am using helm. Also I am getting several errors, I am configuring a multi tenant setup:

Currently I am using this configuration:

global:
  dnsService: coredns
test:
  enabled: false
monitoring:
  selfMonitoring:
    enabled: false
    grafanaAgent:
      installOperator: false
loki:
  auth_enabled: true
  schemaConfig:
    configs:
      - from: "2023-02-28"
        index:
          prefix: loki_ops_index_
          period: 24h
        object_store: s3
        schema: v11
        store: tsdb
      - from: "2025-02-28"
        index:
          prefix: loki_ops_index_
          period: 24h
        object_store: s3
        schema: v13
        store: tsdb
  querier:
    multi_tenant_queries_enabled: true
  storage:
    bucketNames:
      chunks: bucket-chunks
      ruler: bucket-ruler
      admin: bucket-admin
    s3:
      endpoint: https://<ENDPOINT>/
      region: <REGION>
      secretAccessKey: <KEY>
      accessKeyId: <ACCESS_KEY>
  limits_config:
    reject_old_samples: true
    reject_old_samples_max_age: 168h
    max_cache_freshness_per_query: 10m
    split_queries_by_interval: 15m
    query_timeout: 300s
    volume_enabled: true
ingress:
  enabled: true
  ingressClassName: nginx
  annotations:
    cert-manager.io/cluster-issuer: prod
    nginx.ingress.kubernetes.io/proxy-body-size: "0"
    nginx.ingress.kubernetes.io/proxy-buffering: "off"
    nginx.ingress.kubernetes.io/proxy-request-buffering: "off"
    nginx.ingress.kubernetes.io/ssl-redirect: "true"
    nginx.ingress.kubernetes.io/auth-type: basic
    nginx.ingress.kubernetes.io/auth-realm: Tenants
    nginx.ingress.kubernetes.io/auth-secret: <TENANT_SECRET_NAME>
    nginx.ingress.kubernetes.io/auth-secret-type: auth-file
    nginx.ingress.kubernetes.io/configuration-snippet: |
      proxy_set_header X-Scope-OrgID $remote_user;
  hosts: LOKI_DOMAIN_NAME
  tls:
    - secretName: tls
      hosts:
        - LOKI_DOMAIN_NAME

Not sure if I have to delete the PVC for the old loki-writers but I noticed errors without deleted the pvc like this:

evel=error ts=2025-02-27T21:31:17.857503924Z caller=flush.go:261 component=ingester loop=19 org_id=org1|org2 msg="failed to flush" retries=7 err="failed to flush chunks: multiple org IDs present, num_chunks: 1, labels: {app=\"label1\", chart=\"mychart\", cluster=\"mycluster\", component=\"mycomponent\", filename=\"/var/log/pods/..../registry/0.log\", heritage=\"Helm\", job=\"monitoring/kubernetes-logs\", namespace=\"my_namespace\", pod=\"mypod\", pod_template_hash=\"...\", release=\"myrelease\"}"

I also got one of the writers that could start the ingester so I deleted the PVC in that one, and solve the readyness issue because of the digested. Also it seems that there are not multiple org errors, but its a different one:

level=warn ts=2025-03-03T20:14:17.812719897Z caller=grpc_logging.go:76 method=/logproto.Querier/GetChunkIDs duration=204.46µs msg=gRPC err="rpc error: code = Code(499) desc = The request was cancelled by the client."

My data sources work as expected have a valid connection.
Also the Grafana Cloud in the Drilldown shows the error:

An error occurred within the plugin

In the dashboard shows the logs but not sure if its configured correctly. So not sure how if I have to remove that errors to have a valid multi-tenant configuration and also remove the error in the Grafana Cloud UI in the Drilldown. Do I have to remove the rest of old PVC for the loki-writers? Or what its wrong on the configuration. Or skipping something by my side.

smb2 · March 3, 2025, 9:32pm

@tonyswumac could you help me?

tonyswumac · March 4, 2025, 2:05am

Where are you at right now exactly?

You said you have logs in Grafana, which implies you have things working (at least mostly), is that correct? What precisely is the error you are seeing?

smb2 · March 4, 2025, 2:37pm

smb2:

level=warn ts=2025-03-03T20:14:17.812719897Z caller=grpc_logging.go:76 method=/logproto.Querier/GetChunkIDs duration=204.46µs msg=gRPC err="rpc error: code = Code(499) desc = The request was cancelled by the client."

I am exactly in this warning or error

level=warn ts=2025-03-03T20:14:17.812719897Z caller=grpc_logging.go:76 method=/logproto.Querier/GetChunkIDs duration=204.46µs msg=gRPC err="rpc error: code = Code(499) desc = The request was cancelled by the client."

Also an error appear in the Grafana Cloud but I created a ticket there. @tonyswumac

tonyswumac · March 4, 2025, 11:36pm

In terms of functionality, what issues are you observing?

smb2 · March 5, 2025, 1:37pm

Looks like is working but not sure if that warning message affects the functionality, maybe now thats my question.

dmytroborovets · March 12, 2025, 10:11pm

I have the same situation. In terms of functionality, we also have a plugin error in grafana, and every time I try to specify another time range in logs explore, I have an error in the red tab:
An error occurred within the plugin
But it seems that everything functioned as expected. I can see all the logs, but this error is so noisy.
Do u have any idea how to fix it or at least ignore?
Thanks in advance.

dmytroborovets · March 18, 2025, 10:59am

@tonyswumac could you help me?

tonyswumac · March 18, 2025, 4:24pm

You are gonna have to be a bit more specific. What error do you see from Loki? Have you tried to perform an API call to Loki and see if it’s functional? What’s your configuration?

dmytroborovets · March 19, 2025, 11:48am

Hi Tony, thanks for your reply.
We have loki helm chart version 6.27.0(Grafana Community Kubernetes Helm Charts | helm-charts → loki) with such values:

loki:
   schemaConfig:
     configs:
       - from: "2024-04-01"
         store: tsdb
         object_store: s3
         schema: v13
         index:
           prefix: loki_index_
           period: 24h
   storage_config:
     aws:
       region: ${ loki_aws_region }
       bucketnames: ${ loki_chunk_bucket_name }
       s3forcepathstyle: false
   ingester:
       chunk_encoding: snappy
   pattern_ingester:
       enabled: true
   limits_config:
     allow_structured_metadata: true
     volume_enabled: true
     retention_period: 672h # 28 days retention
   compactor:
     retention_enabled: true 
     delete_request_store: s3
   ruler:
    enable_api: true
    storage:
      type: s3
      s3:
        region: ${ loki_aws_region }
        bucketnames: ${ loki_ruler_bucket_name }
        s3forcepathstyle: false
      alertmanager_url: http://prom:9093 # The URL of the Alertmanager to send alerts (Prometheus, Mimir, etc.)

   querier:
      max_concurrent: 4

   storage:
      type: s3
      bucketNames:
        chunks: ${ loki_chunk_bucket_name }
        ruler: ${ loki_ruler_bucket_name }
      s3:
        region: ${ loki_aws_region }


serviceAccount:
 create: true
 annotations:
   "eks.amazonaws.com/role-arn": ${ loki_role_arn }

deploymentMode: Distributed

ingester:
 replicas: 3
 persistence:
   storageClass: gp3
   accessModes:
     - ReadWriteOnce
   size: 10Gi

querier:
 replicas: 3
 maxUnavailable: 2
 persistence:
   storageClass: gp3
   accessModes:
     - ReadWriteOnce
   size: 10Gi
queryFrontend:
 replicas: 2
 maxUnavailable: 1
queryScheduler:
 replicas: 2
distributor:
 replicas: 3
 maxUnavailable: 2
compactor:
 replicas: 1
 persistence:
   storageClass: gp3
   accessModes:
     - ReadWriteOnce
   size: 10Gi
indexGateway:
 replicas: 2
 maxUnavailable: 1
 persistence:
   storageClass: gp3
   accessModes:
     - ReadWriteOnce
   size: 10Gi
ruler:
 replicas: 1
 maxUnavailable: 1
 persistence:
   storageClass: gp3
   accessModes:
     - ReadWriteOnce
   size: 10Gi


gateway:
  enabled: false

ingress:
  enabled: true
  ingressClassName: "alb"
  annotations:
    alb.ingress.kubernetes.io/scheme: internal
    alb.ingress.kubernetes.io/target-type: ip
    alb.ingress.kubernetes.io/backend-protocol: HTTP
    alb.ingress.kubernetes.io/group.name: ${ loki_ingress_group }
    alb.ingress.kubernetes.io/listen-ports: '[{"HTTP":80}, {"HTTPS":443}]'
    alb.ingress.kubernetes.io/ssl-redirect: '443'
    alb.ingress.kubernetes.io/certificate-arn: ${ loki_ingress_acm_certificate_arn }
  paths:
    # -- Paths that are exposed by Loki Distributor.
    # If deployment mode is Distributed, the requests are forwarded to the service: `{{"loki.distributorFullname"}}`.
    # If deployment mode is SimpleScalable, the requests are forwarded to write k8s service: `{{"loki.writeFullname"}}`.
    # If deployment mode is SingleBinary, the requests are forwarded to the central/single k8s service: `{{"loki.singleBinaryFullname"}}`
    distributor:
      - /api/prom/push
      - /loki/api/v1/push
      - /otlp/v1/logs
    # -- Paths that are exposed by Loki Query Frontend.
    # If deployment mode is Distributed, the requests are forwarded to the service: `{{"loki.queryFrontendFullname"}}`.
    # If deployment mode is SimpleScalable, the requests are forwarded to write k8s service: `{{"loki.readFullname"}}`.
    # If deployment mode is SingleBinary, the requests are forwarded to the central/single k8s service: `{{"loki.singleBinaryFullname"}}`
    queryFrontend:
      - /api/prom/query
      # this path covers labels and labelValues endpoints
      - /api/prom/label
      - /api/prom/series
      - /api/prom/tail
      - /loki/api/v1/query
      - /loki/api/v1/query_range
      - /loki/api/v1/tail
      # this path covers labels and labelValues endpoints
      - /loki/api/v1/label
      - /loki/api/v1/labels
      - /loki/api/v1/series
      - /loki/api/v1/index/stats
      - /loki/api/v1/index/volume
      - /loki/api/v1/index/volume_range
      - /loki/api/v1/format_query
      - /loki/api/v1/detected_field
      - /loki/api/v1/detected_fields
      - /loki/api/v1/detected_labels
      - /loki/api/v1/patterns
    # -- Paths that are exposed by Loki Ruler.
    # If deployment mode is Distributed, the requests are forwarded to the service: `{{"loki.rulerFullname"}}`.
    # If deployment mode is SimpleScalable, the requests are forwarded to k8s service: `{{"loki.backendFullname"}}`.
    # If deployment mode is SimpleScalable but `read.legacyReadTarget` is `true`, the requests are forwarded to k8s service: `{{"loki.readFullname"}}`.
    # If deployment mode is SingleBinary, the requests are forwarded to the central/single k8s service: `{{"loki.singleBinaryFullname"}}`
    ruler:
      - /api/prom/rules
      - /api/prom/api/v1/rules
      - /api/prom/api/v1/alerts
      - /loki/api/v1/rules
      - /prometheus/api/v1/rules
      - /prometheus/api/v1/alerts
  hosts:
    - ${ loki_host }

minio:
 enabled: false

backend:
 replicas: 0
read:
 replicas: 0
write:
 replicas: 0

singleBinary:
 replicas: 0

The Grafana pod logs on error:

logger=tsdb.loki endpoint=queryData pluginId=loki dsName=Loki dsUID=decsjur7gndhcb uname=xxx@xx.com fromAlert=false t=2025-03-19T11:10:36.393358581Z level=info msg="Prepared request to Loki" duration=29.24µs queriesLength=1 stage=prepareRequest runInParallel=false
logger=tsdb.loki endpoint=callResource pluginId=loki dsName=Loki dsUID=decsjur7gndhcb uname=xxx@xx.com t=2025-03-19T11:10:36.402441355Z level=info msg="Response received from loki" status=ok statusCode=200 contentLength=139 duration=24.227866ms contentEncoding= stage=databaseRequest
logger=context userId=0 orgId=0 uname= t=2025-03-19T11:10:36.431949763Z level=info msg="Request Completed" method=GET path=/api/live/ws status=401 remote_addr=10.100.1.99 time_ms=1 duration=1.02369ms size=105 referer= handler=/api/live/ws status_source=server
logger=tsdb.loki endpoint=queryData pluginId=loki dsName=Loki dsUID=decsjur7gndhcb uname=xxx@xx.com fromAlert=false t=2025-03-19T11:10:36.433414037Z level=info msg="Prepared request to Loki" duration=71.591µs queriesLength=1 stage=prepareRequest runInParallel=false
logger=tsdb.loki endpoint=queryData pluginId=loki dsName=Loki dsUID=decsjur7gndhcb uname=xxx@xx.com fromAlert=false t=2025-03-19T11:10:36.465830131Z level=info msg="Prepared request to Loki" duration=31.6µs queriesLength=1 stage=prepareRequest runInParallel=false
logger=tsdb.loki endpoint=callResource pluginId=loki dsName=Loki dsUID=decsjur7gndhcb uname=xxx@xx.com t=2025-03-19T11:10:36.501274114Z level=info msg="Response received from loki" status=ok statusCode=500 contentLength=10 duration=35.443303ms contentEncoding= stage=databaseRequest
logger=tsdb.loki endpoint=callResource pluginId=loki dsName=Loki dsUID=decsjur7gndhcb uname=xxx@xx.com t=2025-03-19T11:10:36.504213391Z level=error msg="Failed resource call from loki" err="empty ring" url="/loki/api/v1/patterns?query=%7Bservice_name%3D%60unknown_service%60%7D&start=2025-03-19T10%3A55%3A36.188Z&end=2025-03-19T11%3A10%3A36.188Z&step=2s"
logger=context userId=3 orgId=1 uname=xxx@xx.com t=2025-03-19T11:10:36.505891277Z level=error msg=InternalError error="[plugin.downstreamError] client: failed to call resources: empty ring" remote_addr=10.100.1.99 traceID=
logger=context userId=3 orgId=1 uname=xxx@xx.com t=2025-03-19T11:10:36.507567883Z level=error msg="Request Completed" method=GET path=/api/datasources/uid/decsjur7gndhcb/resources/patterns status=500 remote_addr=10.100.1.99 time_ms=139 duration=139.091616ms size=116 referer="https://grafana.xxx.com/a/grafana-lokiexplore-app/explore/service/unknown_service/logs?displayedFields=%5B%5D&from=now-15m&patterns=%5B%5D&sortOrder=%22Descending%22&timezone=browser&to=now&urlColumns=%5B%5D&var-ds=decsjur7gndhcb&var-fields=&var-filters=service_name%7C%3D%7Cunknown_service&var-levels=&var-lineFilterV2=&var-lineFilters=&var-metadata=&var-patterns=&visualizationType=%22logs%22&wrapLogMessage=" handler=/api/datasources/uid/:uid/resources/* status_source=downstream
logger=tsdb.loki endpoint=queryData pluginId=loki dsName=Loki dsUID=decsjur7gndhcb uname=xxx@xx.com fromAlert=false t=2025-03-19T11:10:36.779237503Z level=info msg="Response received from loki" duration=345.621965ms stage=databaseRequest statusCode=200 contentLength= start=2025-03-19T10:55:36.188Z end=2025-03-19T11:10:36.188Z step=2s query="{service_name=`unknown_service`}     | json | logfmt | drop __error__, __error_details__ " queryType=range direction=backward maxLines=1000 supportingQueryType=grafana-lokiexplore-app lokiHost=loki.xxx.com lokiPath=/loki/api/v1/query_range status=ok

How the error looks like in the grafana itself:

The Loki pod logs error:

2025-03-19T11:44:19.18566282Z stderr F level=info ts=2025-03-19T11:44:19.104471318Z caller=metrics.go:237 component=frontend org_id=1 latency=fast query="{cluster=`xxx`, service_name=`loki`}    |~ \"(?i)rpc error\" | json | logfmt | drop __error__, __error_details__ " query_hash=2501577606 query_type=filter range_type=range length=15m0s start_delta=15m0.567392417s end_delta=567.392687ms step=2s duration=182.608369ms status=200 limit=1000 returned_lines=0 throughput=36MB total_bytes=6.5MB total_bytes_structured_metadata=545kB lines_per_second=115739 total_lines=21135 post_filter_lines=471 total_entries=220 store_chunks_download_time=24.707476ms queue_time=472µs splits=2 shards=2 query_referenced_structured_metadata=false pipeline_wrapper_filtered_lines=0 chunk_refs_fetch_time=48.551824ms cache_chunk_req=6 cache_chunk_hit=6 cache_chunk_bytes_stored=0 cache_chunk_bytes_fetched=689329 cache_chunk_download_time=21.677316ms cache_index_req=0 cache_index_hit=0 cache_index_download_time=0s cache_stats_results_req=1 cache_stats_results_hit=1 cache_stats_results_download_time=2.568925ms cache_volume_results_req=0 cache_volume_results_hit=0 cache_volume_results_download_time=0s cache_result_req=1 cache_result_hit=0 cache_result_download_time=1.165712ms cache_result_query_length_served=0s cardinality_estimate=0 ingester_chunk_refs=0 ingester_chunk_downloaded=0 ingester_chunk_matches=57 ingester_requests=6 ingester_chunk_head_bytes=1.5MB ingester_chunk_compressed_bytes=681kB ingester_chunk_decompressed_bytes=4.3MB ingester_post_filter_lines=463 congestion_control_latency=0s index_total_chunks=0 index_post_bloom_filter_chunks=0 index_bloom_filter_ratio=0.00 index_used_bloom_filters=false index_shard_resolver_duration=0s source=grafana-lokiexplore-app disable_pipeline_wrappers=false has_labelfilter_before_parser=false
2025-03-19T11:42:53.985024768Z stderr F level=warn ts=2025-03-19T11:42:53.887254911Z caller=logging.go:128 orgID=1 msg="GET /loki/api/v1/patterns?query=%7Bcluster%3D%60xxx%60%2C%20service_name%3D%60loki%60%7D&start=2025-03-19T11%3A37%3A53.674Z&end=2025-03-19T11%3A42%3A53.674Z&step=500ms (500) 8.092476ms Response: \"empty ring\" ws: false; Accept: application/json, text/plain, */*; Accept-Encoding: gzip, deflate, br, zstd; Accept-Language: en-GB,en-US;q=0.9,en;q=0.8; Priority: u=1, i; Sec-Ch-Ua: \"Not(A:Brand\";v=\"99\", \"Google Chrome\";v=\"133\", \"Chromium\";v=\"133\"; Sec-Ch-Ua-Mobile: ?0; Sec-Ch-Ua-Platform: \"macOS\"; Sec-Fetch-Dest: empty; Sec-Fetch-Mode: cors; Sec-Fetch-Site: same-origin; User-Agent: Grafana/11.4.0; X-Amzn-Trace-Id: Self=1-67daadbd-3ebb957258cfddf144e56c08;Root=1-67daadbd-51aba1722897875f01eaa5f3; X-Datasource-Uid: decsjur7gndhcb; X-Forwarded-For: 10.100.1.99, 10.100.2.186, 10.100.3.154; X-Forwarded-Port: 443; X-Forwarded-Proto: https; X-Grafana-Id: xxxxxxxxxxx; X-Grafana-Org-Id: 1; X-Grafana-Referer: https://grafana.xxx.com/a/grafana-lokiexplore-app/explore/service/loki/logs?from=now-5m&to=now&var-ds=decsjur7gndhcb&var-filters=cluster%7C%3D%7Cxxx&var-filters=service_name%7C%3D%7Cloki&var-fields=&var-levels=&patterns=%5B%5D&var-metadata=&var-patterns=&var-lineFilterV2=&var-lineFilters=caseInsensitive,0%7C__gfp__%3D%7Cerror&var-lineFilters=caseInsensitive,1%7C__gfp__%3D%7Crpc%20error&timezone=browser&urlColumns=%5B%5D&visualizationType=%22logs%22&displayedFields=%5B%5D&sortOrder=%22Descending%22&wrapLogMessage; X-Plugin-Id: loki; X-Query-Tags: Source=grafana-lokiexplore-app; X-Scope-Orgid: 1; "

As I have already mentioned, we can see all logs, and seems Loki works correctly.
But every time we go to show logs for the specific labels and try to change timerange we have this error.
I would be very grateful if you could help, thank you in advance.

dmytroborovets · March 19, 2025, 2:50pm

If any additional information is needed, I will definitely provide it and thank you in advance.

tonyswumac · March 19, 2025, 10:54pm

Try enabling gateway, and attach your ALB to the gateway. The gateway included in the helm charts already has all the routing rules you need.
Try adding a catch-all rule loki/api/* and send that to query frontend.
Check Your Grafana instance and see if there is any log there when the error shows up.
Check ALB logs as well as ingress logs and see what you can find when error shows up.

Topic		Replies	Views
Loki not writing to s3 no errors Grafana Loki aws , kubernetes	4	2161	June 11, 2023
Writes To Loki (Simple Scalable Architecture) Fail With 503 Error Grafana Loki loki	1	678	June 15, 2024
Issues with grafana/loki Helm on GKE with GCS storage Grafana Loki loki , helm	3	303	October 29, 2024
Unable to PUT when using s3 configuration with loki chart Grafana Loki helm	4	4430	July 23, 2024
Errors moving from Monolith to SSD Grafana Loki loki , configuration , ssd	1	291	July 23, 2024

Loki writer error migration from 5.14.1 to 6.27.0

Related topics