Hi everyone , I am deploying loki-distributed in k8s cluster . I am facing the following issue
Response: "at least 1 live replicas required, could only find 0 - unhealthy instances: 172.39.2.135:9095\n"
in promtail . IP 172.39.2.135 relates to loki ingester pod. the port 3100
this is my ingester config
ingester:
lifecycler:
ring:
kvstore:
store: memberlist
replication_factor: 1
chunk_idle_period: 30m
chunk_block_size: 262144
chunk_encoding: snappy
chunk_retain_period: 1m
max_transfer_retries: 0
wal:
dir: /var/loki/wal
/ # curl http://172.39.2.135:3100/ready
ready
/ # curl http://172.39.2.135:9095/ready
curl: (1) Received HTTP/0.9 when not allowed
/ #
grpc port 9095 is not working as expected .
I already spend much time on it, now I really have no idea about this error. Can anyone help me out here ?
Thanks
gRPC port doesn’t respond to HTTP request, it just has to be open to connectivity from itself (and other Loki containers).
Check and see if there is any other error. Share your configuration if possible.
sure
ingester:
# -- Kind of deployment [StatefulSet/Deployment]
kind: StatefulSet
# -- Number of replicas for the ingester
replicas: 1
# -- hostAliases to add
hostAliases: []
# - ip: 1.2.3.4
# hostnames:
# - domain.tld
autoscaling:
# -- Enable autoscaling for the ingester
enabled: false
# -- Minimum autoscaling replicas for the ingester
minReplicas: 1
# -- Maximum autoscaling replicas for the ingester
maxReplicas: 3
# -- Target CPU utilisation percentage for the ingester
targetCPUUtilizationPercentage: 60
# -- Target memory utilisation percentage for the ingester
targetMemoryUtilizationPercentage: null
# -- Allows one to define custom metrics using the HPA/v2 schema (for example, Pods, Object or External metrics)
customMetrics: []
# - type: Pods
# pods:
# metric:
# name: loki_lines_total
# target:
# type: AverageValue
# averageValue: 10k
behavior:
# -- Enable autoscaling behaviours
enabled: false
# -- define scale down policies, must conform to HPAScalingRules
scaleDown: {}
# -- define scale up policies, must conform to HPAScalingRules
scaleUp: {}
image:
# -- The Docker registry for the ingester image. Overrides `loki.image.registry`
registry: null
# -- Docker image repository for the ingester image. Overrides `loki.image.repository`
repository: null
# -- Docker image tag for the ingester image. Overrides `loki.image.tag`
tag: null
# -- Command to execute instead of defined in Docker image
command: null
# -- The name of the PriorityClass for ingester pods
priorityClassName: null
# -- Labels for ingester pods
podLabels: {}
# -- Annotations for ingester pods
podAnnotations: {}
# -- Labels for ingestor service
serviceLabels: {}
# -- Additional CLI args for the ingester
extraArgs: []
# -- Environment variables to add to the ingester pods
extraEnv: []
# -- Environment variables from secrets or configmaps to add to the ingester pods
extraEnvFrom: []
# -- Volume mounts to add to the ingester pods
extraVolumeMounts: []
# -- Volumes to add to the ingester pods
extraVolumes: []
# -- Resource requests and limits for the ingester
resources: {}
# -- Containers to add to the ingester pods
extraContainers: []
# -- Init containers to add to the ingester pods
initContainers: []
# -- Grace period to allow the ingester to shutdown before it is killed. Especially for the ingestor,
# this must be increased. It must be long enough so ingesters can be gracefully shutdown flushing/transferring
# all data and to successfully leave the member ring on shutdown.
terminationGracePeriodSeconds: 300
# -- Lifecycle for the ingester container
lifecycle: {}
# -- topologySpread for ingester pods. Passed through `tpl` and, thus, to be configured as string
# @default -- Defaults to allow skew no more then 1 node per AZ
topologySpreadConstraints: |
- maxSkew: 1
topologyKey: kubernetes.io/hostname
whenUnsatisfiable: ScheduleAnyway
labelSelector:
matchLabels:
{{- include "loki.ingesterSelectorLabels" . | nindent 6 }}
# -- Affinity for ingester pods. Passed through `tpl` and, thus, to be configured as string
# @default -- Hard node and soft zone anti-affinity
affinity: |
podAntiAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
- labelSelector:
matchLabels:
{{- include "loki.ingesterSelectorLabels" . | nindent 10 }}
topologyKey: kubernetes.io/hostname
preferredDuringSchedulingIgnoredDuringExecution:
- weight: 100
podAffinityTerm:
labelSelector:
matchLabels:
{{- include "loki.ingesterSelectorLabels" . | nindent 12 }}
topologyKey: failure-domain.beta.kubernetes.io/zone
# -- Pod Disruption Budget maxUnavailable
maxUnavailable: null
# -- Max Surge for ingester pods
maxSurge: 0
# -- Node selector for ingester pods
nodeSelector: {}
# -- Tolerations for ingester pods
tolerations: []
# -- readiness probe settings for ingester pods. If empty, use `loki.readinessProbe`
readinessProbe: {}
# -- liveness probe settings for ingester pods. If empty use `loki.livenessProbe`
livenessProbe: {}
persistence:
# -- Enable creating PVCs which is required when using boltdb-shipper
enabled: false
# -- Use emptyDir with ramdisk for storage. **Please note that all data in ingester will be lost on pod restart**
inMemory: false
# -- List of the ingester PVCs
# @notationType -- list
claims:
- name: data
size: 10Gi
# -- Storage class to be used.
# If defined, storageClassName: <storageClass>.
# If set to "-", storageClassName: "", which disables dynamic provisioning.
# If empty or set to null, no storageClassName spec is
# set, choosing the default provisioner (gp2 on AWS, standard on GKE, AWS, and OpenStack).
storageClass: null
# - name: wal
# size: 150Gi
# -- Enable StatefulSetAutoDeletePVC feature
enableStatefulSetAutoDeletePVC: false
whenDeleted: Retain
whenScaled: Retain
# -- Adds the appProtocol field to the ingester service. This allows ingester to work with istio protocol selection.
appProtocol:
# -- Set the optional grpc service protocol. Ex: "grpc", "http2" or "https"
grpc: ""
Don’t see anything obviously wrong. Do you have your actual Loki configuration handy as well?
Couple of things to try:
- Try binding to 0.0.0.0 instead:
memberlist:
bind_addr: ['0.0.0.0']
-
the list under join_members
is supposed to be valid DNS records either for A record service discovery or dnssrv. {{ include "loki.fullname" . }}-memberlist
doesn’t look quite right.
-
You can try not to abort if cluster join fails so that you can troubleshoot easier with the container still running
memberlist:
abort_if_cluster_join_fails: false
Changed the config and redeployed the helm . facing the same issue again .
What error messages are you seeing?