Response: \"at least 1 live replicas required, could only find 0 - unhealthy instances: 172.39.2.135:9095\\n\"

Hi everyone , I am deploying loki-distributed in k8s cluster . I am facing the following issue

Response: "at least 1 live replicas required, could only find 0 - unhealthy instances: 172.39.2.135:9095\n"

in promtail . IP 172.39.2.135 relates to loki ingester pod. the port 3100

this is my ingester config

ingester:
      lifecycler:
        ring:
          kvstore:
            store: memberlist
          replication_factor: 1
      chunk_idle_period: 30m
      chunk_block_size: 262144
      chunk_encoding: snappy
      chunk_retain_period: 1m
      max_transfer_retries: 0
      wal:
        dir: /var/loki/wal

/ # curl http://172.39.2.135:3100/ready
ready
/ # curl http://172.39.2.135:9095/ready
curl: (1) Received HTTP/0.9 when not allowed
/ #

grpc port 9095 is not working as expected .

I already spend much time on it, now I really have no idea about this error. Can anyone help me out here ?

Thanks

gRPC port doesn’t respond to HTTP request, it just has to be open to connectivity from itself (and other Loki containers).

Check and see if there is any other error. Share your configuration if possible.

sure

ingester:
  # -- Kind of deployment [StatefulSet/Deployment]
  kind: StatefulSet
  # -- Number of replicas for the ingester
  replicas: 1
  # -- hostAliases to add
  hostAliases: []
  #  - ip: 1.2.3.4
  #    hostnames:
  #      - domain.tld
  autoscaling:
    # -- Enable autoscaling for the ingester
    enabled: false
    # -- Minimum autoscaling replicas for the ingester
    minReplicas: 1
    # -- Maximum autoscaling replicas for the ingester
    maxReplicas: 3
    # -- Target CPU utilisation percentage for the ingester
    targetCPUUtilizationPercentage: 60
    # -- Target memory utilisation percentage for the ingester
    targetMemoryUtilizationPercentage: null
    # -- Allows one to define custom metrics using the HPA/v2 schema (for example, Pods, Object or External metrics)
    customMetrics: []
    # - type: Pods
    #   pods:
    #     metric:
    #       name: loki_lines_total
    #     target:
    #       type: AverageValue
    #       averageValue: 10k
    behavior:
      # -- Enable autoscaling behaviours
      enabled: false
      # -- define scale down policies, must conform to HPAScalingRules
      scaleDown: {}
      # -- define scale up policies, must conform to HPAScalingRules
      scaleUp: {}
  image:
    # -- The Docker registry for the ingester image. Overrides `loki.image.registry`
    registry: null
    # -- Docker image repository for the ingester image. Overrides `loki.image.repository`
    repository: null
    # -- Docker image tag for the ingester image. Overrides `loki.image.tag`
    tag: null
  # -- Command to execute instead of defined in Docker image
  command: null
  # -- The name of the PriorityClass for ingester pods
  priorityClassName: null
  # -- Labels for ingester pods
  podLabels: {}
  # -- Annotations for ingester pods
  podAnnotations: {}
  # -- Labels for ingestor service
  serviceLabels: {}
  # -- Additional CLI args for the ingester
  extraArgs: []
  # -- Environment variables to add to the ingester pods
  extraEnv: []
  # -- Environment variables from secrets or configmaps to add to the ingester pods
  extraEnvFrom: []
  # -- Volume mounts to add to the ingester pods
  extraVolumeMounts: []
  # -- Volumes to add to the ingester pods
  extraVolumes: []
  # -- Resource requests and limits for the ingester
  resources: {}
  # -- Containers to add to the ingester pods
  extraContainers: []
  # -- Init containers to add to the ingester pods
  initContainers: []
  # -- Grace period to allow the ingester to shutdown before it is killed. Especially for the ingestor,
  # this must be increased. It must be long enough so ingesters can be gracefully shutdown flushing/transferring
  # all data and to successfully leave the member ring on shutdown.
  terminationGracePeriodSeconds: 300
  # -- Lifecycle for the ingester container
  lifecycle: {}
  # -- topologySpread for ingester pods. Passed through `tpl` and, thus, to be configured as string
  # @default -- Defaults to allow skew no more then 1 node per AZ
  topologySpreadConstraints: |
    - maxSkew: 1
      topologyKey: kubernetes.io/hostname
      whenUnsatisfiable: ScheduleAnyway
      labelSelector:
        matchLabels:
          {{- include "loki.ingesterSelectorLabels" . | nindent 6 }}
  # -- Affinity for ingester pods. Passed through `tpl` and, thus, to be configured as string
  # @default -- Hard node and soft zone anti-affinity
  affinity: |
    podAntiAffinity:
      requiredDuringSchedulingIgnoredDuringExecution:
        - labelSelector:
            matchLabels:
              {{- include "loki.ingesterSelectorLabels" . | nindent 10 }}
          topologyKey: kubernetes.io/hostname
      preferredDuringSchedulingIgnoredDuringExecution:
        - weight: 100
          podAffinityTerm:
            labelSelector:
              matchLabels:
                {{- include "loki.ingesterSelectorLabels" . | nindent 12 }}
            topologyKey: failure-domain.beta.kubernetes.io/zone
  # -- Pod Disruption Budget maxUnavailable
  maxUnavailable: null
  # -- Max Surge for ingester pods
  maxSurge: 0
  # -- Node selector for ingester pods
  nodeSelector: {}
  # -- Tolerations for ingester pods
  tolerations: []
  # -- readiness probe settings for ingester pods. If empty, use `loki.readinessProbe`
  readinessProbe: {}
  # -- liveness probe settings for ingester pods. If empty use `loki.livenessProbe`
  livenessProbe: {}
  persistence:
    # -- Enable creating PVCs which is required when using boltdb-shipper
    enabled: false
    # -- Use emptyDir with ramdisk for storage. **Please note that all data in ingester will be lost on pod restart**
    inMemory: false
    # -- List of the ingester PVCs
    # @notationType -- list
    claims:
      - name: data
        size: 10Gi
        #   -- Storage class to be used.
        #   If defined, storageClassName: <storageClass>.
        #   If set to "-", storageClassName: "", which disables dynamic provisioning.
        #   If empty or set to null, no storageClassName spec is
        #   set, choosing the default provisioner (gp2 on AWS, standard on GKE, AWS, and OpenStack).
        storageClass: null
      # - name: wal
      #   size: 150Gi
    # -- Enable StatefulSetAutoDeletePVC feature
    enableStatefulSetAutoDeletePVC: false
    whenDeleted: Retain
    whenScaled: Retain
  # -- Adds the appProtocol field to the ingester service. This allows ingester to work with istio protocol selection.
  appProtocol:
    # -- Set the optional grpc service protocol. Ex: "grpc", "http2" or "https"
    grpc: ""

Don’t see anything obviously wrong. Do you have your actual Loki configuration handy as well?

yes

loki:

– If set, these annotations are added to all of the Kubernetes controllers

(Deployments, StatefulSets, etc) that this chart launches. Use this to

implement something like the “Wave” controller or another controller that

is monitoring top level deployment resources.

annotations: {}

Configures the readiness probe for all of the Loki pods

readinessProbe:
httpGet:
path: /ready
port: http
initialDelaySeconds: 30
timeoutSeconds: 1
livenessProbe:
httpGet:
path: /ready
port: http
initialDelaySeconds: 300
image:
# – The Docker registry
registry: docker.io
# – Docker image repository
repository: grafana/loki
# – Overrides the image tag whose default is the chart’s appVersion
tag: null
# – Docker image pull policy
pullPolicy: IfNotPresent

– Common labels for all pods

podLabels: {}

– Common annotations for all pods

podAnnotations: {}

– Common command override for all pods (except gateway)

command: null

– The number of old ReplicaSets to retain to allow rollback

revisionHistoryLimit: 10

– The SecurityContext for Loki pods

podSecurityContext:
fsGroup: 10001
runAsGroup: 10001
runAsNonRoot: true
runAsUser: 10001

– The SecurityContext for Loki containers

containerSecurityContext:
readOnlyRootFilesystem: true
capabilities:
drop:
- ALL
allowPrivilegeEscalation: false

– Specify an existing secret containing loki configuration. If non-empty, overrides loki.config

existingSecretForConfig: “”

– Store the loki configuration as a secret.

configAsSecret: false

– Annotations for the secret with loki configuration.

configSecretAnnotations: {}

– Additional labels for the secret with loki configuration.

configSecretLabels: {}

– Adds the appProtocol field to the memberlist service. This allows memberlist to work with istio protocol selection. Ex: “http” or “tcp”

appProtocol: “”

– Common annotations for all loki services

serviceAnnotations: {}

Loki server configuration

Refers to Grafana Loki configuration parameters | Grafana Loki documentation

server:
# – HTTP server listen port
http_listen_port: 3100
grpc_listen_port: 9095

– Config file contents for Loki

@default – See values.yaml

config: |
auth_enabled: false

server:
  {{- toYaml .Values.loki.server | nindent 6 }}

common:
  compactor_address: http://{{ include "loki.compactorFullname" . }}:3100

distributor:
  ring:
    kvstore:
      store: memberlist

memberlist:
  join_members:
    - {{ include "loki.fullname" . }}-memberlist
  bind_addr:
    - ${MY_POD_IP}
  #advertise_addr: ${POD_IP}

ingester_client:
  grpc_client_config:
    grpc_compression: gzip

ingester:
  lifecycler:
    ring:
      kvstore:
        store: memberlist
      replication_factor: 1
  chunk_idle_period: 30m
  chunk_block_size: 262144
  chunk_encoding: snappy
  chunk_retain_period: 1m
  max_transfer_retries: 0
  wal:
    dir: /var/loki/wal

limits_config:
  enforce_metric_name: false
  reject_old_samples: true
  reject_old_samples_max_age: 168h
  max_cache_freshness_per_query: 10m
  split_queries_by_interval: 15m

{{- if .Values.loki.schemaConfig}}
schema_config:
{{- toYaml .Values.loki.schemaConfig | nindent 2}}
{{- end}}
{{- if .Values.loki.storageConfig}}
storage_config:
{{- if .Values.indexGateway.enabled}}
{{- $indexGatewayClient := dict "server_address" (printf "dns:///%s:9095" (include "loki.indexGatewayFullname" .)) }}
{{- $_ := set .Values.loki.storageConfig.boltdb_shipper "index_gateway_client" $indexGatewayClient }}
{{- end}}
{{- toYaml .Values.loki.storageConfig | nindent 2}}
{{- if .Values.memcachedIndexQueries.enabled }}
  index_queries_cache_config:
    memcached_client:
      addresses: dnssrv+_memcached-client._tcp.{{ include "loki.memcachedIndexQueriesFullname" . }}.{{ .Release.Namespace }}.svc.{{ .Values.global.clusterDomain }}
      consistent_hash: true
{{- end}}
{{- end}}

runtime_config:
  file: /var/{{ include "loki.name" . }}-runtime/runtime.yaml

chunk_store_config:
  max_look_back_period: 0s
  {{- if .Values.memcachedChunks.enabled }}
  chunk_cache_config:
    embedded_cache:
      enabled: false
    memcached_client:
      consistent_hash: true
      addresses: dnssrv+_memcached-client._tcp.{{ include "loki.memcachedChunksFullname" . }}.{{ .Release.Namespace }}.svc.{{ .Values.global.clusterDomain }}
  {{- end }}
  {{- if .Values.memcachedIndexWrites.enabled }}
  write_dedupe_cache_config:
    memcached_client:
      consistent_hash: true
      addresses: dnssrv+_memcached-client._tcp.{{ include "loki.memcachedIndexWritesFullname" . }}.{{ .Release.Namespace }}.svc.{{ .Values.global.clusterDomain }}
  {{- end }}

table_manager:
  retention_deletes_enabled: false
  retention_period: 0s

query_range:
  align_queries_with_step: true
  max_retries: 5
  cache_results: true
  results_cache:
    cache:
      {{- if .Values.memcachedFrontend.enabled }}
      memcached_client:
        addresses: dnssrv+_memcached-client._tcp.{{ include "loki.memcachedFrontendFullname" . }}.{{ .Release.Namespace }}.svc.{{ .Values.global.clusterDomain }}
        consistent_hash: true
      {{- else }}
      embedded_cache:
        enabled: true
        ttl: 24h
      {{- end }}

frontend_worker:
  {{- if .Values.queryScheduler.enabled }}
  scheduler_address: {{ include "loki.querySchedulerFullname" . }}:9095
  {{- else }}
  frontend_address: {{ include "loki.queryFrontendFullname" . }}-headless:9095
  {{- end }}

frontend:
  log_queries_longer_than: 5s
  compress_responses: true
  {{- if .Values.queryScheduler.enabled }}
  scheduler_address: {{ include "loki.querySchedulerFullname" . }}:9095
  {{- end }}
  tail_proxy_url: http://{{ include "loki.querierFullname" . }}:3100

compactor:
  shared_store: filesystem
  working_directory: /var/loki/compactor

ruler:
  storage:
    type: local
    local:
      directory: /etc/loki/rules
  ring:
    kvstore:
      store: memberlist
  rule_path: /tmp/loki/scratch
  alertmanager_url: https://alertmanager.xx
  external_url: https://alertmanager.xx

schemaConfig:
configs:
- from: “2020-09-07”
store: boltdb-shipper
object_store: filesystem
schema: v11
index:
prefix: loki_index_
period: 24h

storageConfig:
boltdb_shipper:
shared_store: filesystem
active_index_directory: /var/loki/index
cache_location: /var/loki/cache
cache_ttl: 168h
filesystem:
directory: /var/loki/chunks

– Uncomment to configure each storage individually

azure: {}

gcs: {}

s3: {}

boltdb: {}

Couple of things to try:

  1. Try binding to 0.0.0.0 instead:
memberlist:
  bind_addr: ['0.0.0.0']
  1. the list under join_members is supposed to be valid DNS records either for A record service discovery or dnssrv. {{ include "loki.fullname" . }}-memberlist doesn’t look quite right.

  2. You can try not to abort if cluster join fails so that you can troubleshoot easier with the container still running

memberlist:
  abort_if_cluster_join_fails: false

Changed the config and redeployed the helm . facing the same issue again .

level=warn ts=2024-06-25T08:28:15.471738932Z caller=logging.go:123 traceID=49e7eefb1868e86a orgID=fake msg="POST /loki/api/v1/push (500) 401.433µs Response: "at least 1 live replicas required, could only find 0 - unhealthy instances: 172.39.3.250:9095\n" ws: false; Accept: /; Connection: close; Content-Length: 311; Content-Type: application/json; User-Agent: curl/7.81.0; "

What error messages are you seeing?