Random unavailability of queriers to frontend in "read" mode

We are running Loki 2.8.2 in split read/write paths. On the read nodes, randomly the service will fail start working, emitting repeated caller=frontend.go:342 msg=“not ready: number of queriers connected to query-frontend is 0” error messages until the service is restarted a few times and it will begin working. Can somebody please help? Here is my config:

target: read

auth_enabled: true

common:
  storage:
    s3:
      access_key_id: xxxxx
      bucketnames: loki
      endpoint: xxxxx
      http_config:
        insecure_skip_verify: true
      insecure: false
      region: default
      s3forcepathstyle: true
      secret_access_key: xxxxx
compactor:
  compaction_interval: 10m
  retention_delete_delay: 2h
  retention_delete_worker_count: 150
  retention_enabled: true
  shared_store: s3
  working_directory: /loki/compactor
distributor:
  ring:
    kvstore:
      store: memberlist
ingester:
  chunk_idle_period: 5m
  chunk_retain_period: 30s
  lifecycler:
    final_sleep: 0s
    ring:
      kvstore:
        store: memberlist
      replication_factor: 1
limits_config:
  enforce_metric_name: false
  ingestion_burst_size_mb: 1000
  ingestion_rate_mb: 10000
  max_entries_limit_per_query: 1000000
  max_global_streams_per_user: 10000
  max_label_name_length: 10240
  max_label_value_length: 20480
  max_streams_per_user: 0
  reject_old_samples: false
  reject_old_samples_max_age: 720h
  retention_period: 336h
memberlist:
  abort_if_cluster_join_fails: false
  bind_port: 7946
  join_members:
  - xxxxxxxxxx:7946
  max_join_backoff: 1m
  max_join_retries: 10
  min_join_backoff: 1s
querier:
  multi_tenant_queries_enabled: true
schema_config:
  configs:
  - from: '2020-05-15'
    index:
      period: 24h
      prefix: index_
    object_store: s3
    schema: v11
    store: boltdb-shipper
  - from: '2023-03-04'
    index:
      period: 24h
      prefix: index_tsdb_
    object_store: s3
    schema: v12
    store: tsdb
server:
  grpc_listen_port: 9443
  grpc_server_max_concurrent_streams: 1000
  grpc_server_max_recv_msg_size: 104857600
  grpc_server_max_send_msg_size: 104857600
  http_listen_port: 8443
  http_tls_config:
    cert_file: /etc/loki/ssl/cert.crt
    key_file: /etc/loki/ssl/cert.key
storage_config:
  boltdb_shipper:
    active_index_directory: /loki/boltdb-shipper-active
    cache_location: /loki/boltdb-shipper-cache
    cache_ttl: 24h
    shared_store: s3
  tsdb_shipper:
    active_index_directory: /loki/tsdb-shipper-active
    cache_location: /loki/tsdb-shipper-cache
    shared_store: s3

Thank you in advance!

FYI I have submitted the following issue to GitHub for this problem (looks like it might be a bug): Random unavailability of queriers to frontend in “read” mode · Issue #9559 · grafana/loki · GitHub

This topic was automatically closed 365 days after the last reply. New replies are no longer allowed.