Gossip-ring / memberlist - no private IP address found

randomnumber09 · August 20, 2021, 10:08pm

Attempting to use helm chart to deploy tempo and having a bit of an issue out of the gate.
Helm Chart/Version: grafana/tempo-distributed/0.9.13
K8s version: 1.20

When deploying, the service “…gossip-ring” does not obtain a cluster IP.
tempo-tempo-distributed-compactor ClusterIP 100.135.62.159 3100/TCP 31h
tempo-tempo-distributed-distributor ClusterIP 100.132.32.149 3100/TCP,9095/TCP,55681/TCP,4317/TCP,55680/TCP 31h
tempo-tempo-distributed-gossip-ring ClusterIP None 7946/TCP 31h
tempo-tempo-distributed-ingester ClusterIP 100.130.141.136 3100/TCP,9095/TCP 31h
tempo-tempo-distributed-memcached ClusterIP 100.133.223.43 11211/TCP,9150/TCP 31h
tempo-tempo-distributed-querier ClusterIP 100.135.9.70 3100/TCP,9095/TCP 31h
tempo-tempo-distributed-query-frontend ClusterIP 100.128.68.231 3100/TCP,9095/TCP,16686/TCP,16687/TCP 31h
tempo-tempo-distributed-query-frontend-discovery ClusterIP None 3100/TCP,9095/TCP,16686/TCP,16687/TCP 31h

Also, compactor, ingester, querier, and distributor are crashloopbackoff due to error ```
failed: failed to create memberlist: Failed to get final advertise address: no private IP address found, and explicit IP not provided"


Similar issue to https://github.com/grafana/helm-charts/issues/328

I could try changing store to etcd but curious if anyone else has seen this behaviour.

koenraad · August 24, 2021, 10:21am

It’s expected that the gossip ring service does not have a cluster IP, it’s a headless service so ClusterIP is explicitly set to none.

I haven’t seen that error before, can you share the memberlist part of Tempo’s configmap or the helm values you are using?
It should contain something like this:

memberlist:
  abort_if_cluster_join_fails: false
  join_members:
    - tempo-tempo-distributed-gossip-ring

This tells the Tempo components to find other components using the gossip-ring service.

joeelliott · August 24, 2021, 12:59pm

I think this message is returned when the ring code cannot automatically find an IP. By default it looks for ips on network adapters named eth0 and en0. Can you check to see what your adapter name is?

This can be changed like this:

distributor:
  ring:
    instance_interface_names:
    - <whatever your interface name is>

For the ingester:

ingester:
  lifecycler:
    ring:
      interface_names:
      - <whatever your interface name is>

randomnumber09 · August 24, 2021, 7:09pm

Thanks @koenraad

This is the portion of the memberlist from my configmap. It looks identical:

    memberlist:
      abort_if_cluster_join_fails: false
      join_members:
        - tempo-tempo-distributed-gossip-ring

Helm file that I have been adjusting:

tempo:
  image:
    # -- The Docker registry
    registry: docker.io
    # -- Docker image repository
    repository: grafana/tempo
    # -- Overrides the image tag whose default is the chart's appVersion
    tag: null
    pullPolicy: IfNotPresent
  readinessProbe:
    httpGet:
      path: /ready
      port: http
    initialDelaySeconds: 30
    timeoutSeconds: 1

# Configuration for the ingester
ingester:
  # -- Number of replicas for the ingester
  replicas: 1
    # -- Grace period to allow the ingester to shutdown before it is killed. Especially for the ingestor,
  # this must be increased. It must be long enough so ingesters can be gracefully shutdown flushing/transferring
  # all data and to successfully leave the member ring on shutdown.
  terminationGracePeriodSeconds: 300
  # -- Affinity for ingester pods. Passed through `tpl` and, thus, to be configured as string
  # @default -- Hard node and soft zone anti-affinity
  # -- Annotations for ingester pods
  affinity: |
    podAntiAffinity:
      requiredDuringSchedulingIgnoredDuringExecution:
        - labelSelector:
            matchLabels:
              {{- include "tempo.ingesterSelectorLabels" . | nindent 10 }}
          topologyKey: kubernetes.io/hostname
      preferredDuringSchedulingIgnoredDuringExecution:
        - weight: 100
          podAffinityTerm:
            labelSelector:
              matchLabels:
                {{- include "tempo.ingesterSelectorLabels" . | nindent 12 }}
            topologyKey: failure-domain.beta.kubernetes.io/zone
  persistence:
    # -- Enable creating PVCs which is required when using boltdb-shipper
    enabled: false
    # -- Size of persistent disk
    size: 10Gi
    # -- Storage class to be used.
    # If defined, storageClassName: <storageClass>.
    # If set to "-", storageClassName: "", which disables dynamic provisioning.
    # If empty or set to null, no storageClassName spec is
    # set, choosing the default provisioner (gp2 on AWS, standard on GKE, AWS, and OpenStack).
    storageClass: null

distributor:
  # -- Number of replicas for the distributor
  replicas: 1
  service:
    # -- Type of service for the distributor
    type: ClusterIP
  # -- Grace period to allow the distributor to shutdown before it is killed
  terminationGracePeriodSeconds: 30
    # -- Annotations for distributor pods
  # -- Affinity for distributor pods. Passed through `tpl` and, thus, to be configured as string
  # @default -- Hard node and soft zone anti-affinity
  affinity: |
    podAntiAffinity:
      requiredDuringSchedulingIgnoredDuringExecution:
        - labelSelector:
            matchLabels:
              {{- include "tempo.distributorSelectorLabels" . | nindent 10 }}
          topologyKey: kubernetes.io/hostname
      preferredDuringSchedulingIgnoredDuringExecution:
        - weight: 100
          podAffinityTerm:
            labelSelector:
              matchLabels:
                {{- include "tempo.distributorSelectorLabels" . | nindent 12 }}
            topologyKey: failure-domain.beta.kubernetes.io/zone

compactor:
  # -- Grace period to allow the compactor to shutdown before it is killed
  terminationGracePeriodSeconds: 30
  # -- Annotations for compactor pods

# Configuration for the querier
querier:
  # -- Number of replicas for the querier
  replicas: 1
  # -- Grace period to allow the querier to shutdown before it is killed
  terminationGracePeriodSeconds: 30
  # -- Affinity for querier pods. Passed through `tpl` and, thus, to be configured as string
  # @default -- Hard node and soft zone anti-affinity
  # -- Annotations for querier pods
  affinity: |
    podAntiAffinity:
      requiredDuringSchedulingIgnoredDuringExecution:
        - labelSelector:
            matchLabels:
              {{- include "tempo.querierSelectorLabels" . | nindent 10 }}
          topologyKey: kubernetes.io/hostname
      preferredDuringSchedulingIgnoredDuringExecution:
        - weight: 100
          podAffinityTerm:
            labelSelector:
              matchLabels:
                {{- include "tempo.querierSelectorLabels" . | nindent 12 }}
            topologyKey: failure-domain.beta.kubernetes.io/zone
  
# Configuration for the query-frontend
queryFrontend:
  query:
    # -- Required for grafana version <7.5 for compatibility with jaeger-ui. Doesn't work on ARM arch
    enabled: true
    image:
      # -- Docker image repository for the query-frontend image. Overrides `tempo.image.repository`
      repository: grafana/tempo-query
      config: |
      backend: 127.0.0.1:3100
  # -- Number of replicas for the query-frontend
  replicas: 1
  service:
    # -- Annotations for queryFrontend service
    annotations: {}
    # -- Type of service for the queryFrontend
    type: ClusterIP
  # -- The name of the PriorityClass for query-frontend pods
    # -- Grace period to allow the query-frontend to shutdown before it is killed
  terminationGracePeriodSeconds: 30
  # -- Annotations for query-frontend pods
  # -- Affinity for query-frontend pods. Passed through `tpl` and, thus, to be configured as string
  # @default -- Hard node and soft zone anti-affinity
  affinity: |
    podAntiAffinity:
      requiredDuringSchedulingIgnoredDuringExecution:
        - labelSelector:
            matchLabels:
              {{- include "tempo.queryFrontendSelectorLabels" . | nindent 10 }}
          topologyKey: kubernetes.io/hostname
      preferredDuringSchedulingIgnoredDuringExecution:
        - weight: 100
          podAffinityTerm:
            labelSelector:
              matchLabels:
                {{- include "tempo.queryFrontendSelectorLabels" . | nindent 12 }}
            topologyKey: failure-domain.beta.kubernetes.io/zone
traces:
  jaeger:
    # -- Enable Tempo to ingest Jaeger GRPC traces
    grpc: false
    # -- Enable Tempo to ingest Jaeger Thrift Binary traces
    thriftBinary: false
    # -- Enable Tempo to ingest Jaeger Thrift Compact traces
    thriftCompact: false
    # -- Enable Tempo to ingest Jaeger Thrift HTTP traces
    thriftHttp: false
  # -- Enable Tempo to ingest Zipkin traces
  zipkin: false
  otlp:
    # -- Enable Tempo to ingest Open Telementry HTTP traces
    http: true
    # -- Enable Tempo to ingest Open Telementry GRPC traces
    grpc: true
  # -- Enable Tempo to ingest Open Census traces
  opencensus: false

config: |
  multitenancy_enabled: false
  compactor:
    compaction:
      block_retention: 48h
    ring:
      kvstore:
        store: memberlist
  distributor:
    ring:
      kvstore:
        store: memberlist
    receivers:
      {{- if  or (.Values.traces.jaeger.thriftCompact) (.Values.traces.jaeger.thriftBinary) (.Values.traces.jaeger.thriftHttp) (.Values.traces.jaeger.grpc) }}
      jaeger:
        protocols:
          {{- if .Values.traces.jaeger.thriftCompact }}
          thrift_compact:
            endpoint: 0.0.0.0:6831
          {{- end }}
          {{- if .Values.traces.jaeger.thriftBinary }}
          thrift_binary:
            endpoint: 0.0.0.0:6832
          {{- end }}
          {{- if .Values.traces.jaeger.thriftHttp }}
          thrift_http:
            endpoint: 0.0.0.0:14268
          {{- end }}
          {{- if .Values.traces.jaeger.grpc }}
          grpc:
            endpoint: 0.0.0.0:14250
          {{- end }}
      {{- end }}
      {{- if .Values.traces.zipkin}}
      zipkin:
        endpoint: 0.0.0.0:9411
      {{- end }}
      {{- if or (.Values.traces.otlp.http) (.Values.traces.otlp.grpc) }}
      otlp:
        protocols:
          {{- if .Values.traces.otlp.http }}
          http:
            endpoint: 0.0.0.0:55681
          {{- end }}
          {{- if .Values.traces.otlp.grpc }}
          grpc:
            endpoint: 0.0.0.0:4317
          {{- end }}
      {{- end }}
      {{- if .Values.traces.opencensus}}
      opencensus:
        endpoint: 0.0.0.0:55678
      {{- end }}
  querier:
    frontend_worker:
      frontend_address: {{ include "tempo.queryFrontendFullname" . }}-discovery:9095
  ingester:
    lifecycler:
      ring:
        replication_factor: 1
        kvstore:
          store: memberlist
      tokens_file_path: /var/tempo/tokens.json
  memberlist:
    abort_if_cluster_join_fails: false
    join_members:
      - {{ include "tempo.fullname" . }}-gossip-ring
  overrides:
    per_tenant_override_config: /conf/overrides.yaml
  server:
    http_listen_port: 3100
  storage:
    trace:
      backend: {{.Values.storage.trace.backend}}
      {{- if eq .Values.storage.trace.backend "gcs"}}
      gcs:
        {{- toYaml .Values.storage.trace.gcs | nindent 6}}
      {{- end}}
      {{- if eq .Values.storage.trace.backend "s3"}}
      s3:
        {{- toYaml .Values.storage.trace.s3 | nindent 6}}
      {{- end}}
      {{- if eq .Values.storage.trace.backend "azure"}}
      azure:
        {{- toYaml .Values.storage.trace.azure | nindent 6}}
      {{- end}}
      blocklist_poll: 5m
      local:
        path: /var/tempo/traces
      wal:
        path: /var/tempo/wal
      cache: memcached
      memcached:
        consistent_hash: true
        host: {{ include "tempo.fullname" . }}-memcached
        service: memcached-client
        timeout: 500ms

# To configure a different storage backend instead of local storage:
# storage:
#   trace:
#     backend: azure
#     azure:
#       container-name:
#       storage-account-name:
#       storage-account-key:
# -- the supported storage backends are gcs, s3 and azure
# -- as specified in https://grafana.com/docs/tempo/latest/configuration/#storage

# Set ingestion overrides
overrides: |
  overrides: {}

# memcached is for all of the Tempo pieces to coordinate with each other.
# you can use your self memcacherd by set enable: false and host + service
memcached:
  # -- Specified whether the memcached cachce should be enabled
  enabled: true
  host: memcached
  # Number of replicas for memchached
  replicas: 1
  service: memcached-client
  # -- Memcached Docker image repository
  repository: memcached
  # -- Memcached Docker image tag
  tag: 1.5.17-alpine
  # -- Memcached Docker image pull policy
  pullPolicy: IfNotPresent
  # -- Additional CLI args for memcached
  extraArgs: []
  # -- Environment variables to add to memcached pods
  extraEnv: []
  # -- Environment variables from secrets or configmaps to add to memcached pods
  extraEnvFrom: []
  # -- Resource requests and limits for memcached
  resources: {}
  # -- Affinity for memcached pods. Passed through `tpl` and, thus, to be configured as string
  # @default -- Hard node and soft zone anti-affinity
  affinity: |
    podAntiAffinity:
      requiredDuringSchedulingIgnoredDuringExecution:
        - labelSelector:
            matchLabels:
              {{- include "tempo.memcachedSelectorLabels" . | nindent 10 }}
          topologyKey: kubernetes.io/hostname
      preferredDuringSchedulingIgnoredDuringExecution:
        - weight: 100
          podAffinityTerm:
            labelSelector:
              matchLabels:
                {{- include "tempo.memcachedSelectorLabels" . | nindent 12 }}
            topologyKey: failure-domain.beta.kubernetes.io/zone

memcachedExporter:
  # -- Specifies whether the Memcached Exporter should be enabled
  enabled: false
  image:
    # -- Memcached Exporter Docker image repository
    repository: prom/memcached-exporter
    # -- Memcached Exporter Docker image tag
    tag: v0.8.0
    # -- Memcached Exporter Docker image pull policy
    pullPolicy: IfNotPresent
    # -- Memcached Exporter resource requests and limits
  resources: {}

# ServiceMonitor configuration
serviceMonitor:
  # -- If enabled, ServiceMonitor resources for Prometheus Operator are created
  enabled: false
  # -- Alternative namespace for ServiceMonitor resources
  namespace: null
  # -- Namespace selector for ServiceMonitor resources
  namespaceSelector: {}
  # -- ServiceMonitor annotations
  annotations: {}
  # -- Additional ServiceMonitor labels
  labels: {}
  # -- ServiceMonitor scrape interval
  interval: null
  # -- ServiceMonitor scrape timeout in Go duration format (e.g. 15s)
  scrapeTimeout: null
  # -- ServiceMonitor will use http by default, but you can pick https as well
  scheme: http
  # -- ServiceMonitor will use these tlsConfig settings to make the health check requests
  tlsConfig: null

randomnumber09 · August 24, 2021, 9:05pm

@joeelliott I agree. I checked and as far as I can tell, the network adapters is eth0
I checked on this issue failed: failed to create memberlist: Failed to get final advertise address: no private IP address found, and explicit IP not provided · Issue #571 · grafana/tempo · GitHub… looks very similar to this person’s blocker.

Thank you for your response Joe.

randomnumber09 · August 26, 2021, 9:34pm

Configured using etcd and so far so good.

clement13 · December 28, 2021, 4:49am

Hi, can you share values file you used to configure loki with ETCD ?

warrenhodgkinson · April 21, 2022, 11:36am

We had this problem when our pod’s IP range was 240.x.x.x, which is reserved, and got filtered out (Note: there are other ranges that are also filtered out go-sockaddr/rfc.go at 6d291a969b86c4b633730bfc6b8b9d64c3aafed9 · hashicorp/go-sockaddr · GitHub).

Changing the pod’s IP range was not an option, so to solve this, we had to set the memberlist bind address in the config file explicitly to the pod’s ip address.

To do this, you need to:

create an environment variable with the pod’s ip (in values.yaml):

ingester:
  extraEnv:
    - name: MY_POD_IP
      valueFrom:
        fieldRef:
          fieldPath: status.podIP

Allow environment variable expansion in the config file (do this per service in values.yaml):

ingester:
  extraArgs:
    - -config.expand-env=true

Specify the memberlist bind address for the config file (also in values.yaml):

config;
  memberlist:
    bind_addr:
      - ${MY_POD_IP}

randomnumber09 · August 26, 2022, 1:41pm

Thank you @warrenhodgkinson

This solution also worked for other grafana labs gossip ring setups.

system · August 26, 2023, 1:41pm

This topic was automatically closed 365 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Grafana gossip ring service seems to have wrong IP addresses Grafana Tempo	5	2077	November 15, 2022
How to create a gossip ring for tempo Grafana Tempo	8	4228	October 4, 2022
This instance doesn't use memberlist Grafana Tempo	0	55	July 22, 2024
Tempo ingester ring not forming Grafana Tempo	4	2609	August 19, 2022
Tempo distributed helm-chart configuration Grafana Tempo	2	1115	October 24, 2024

Gossip-ring / memberlist - no private IP address found

Related topics