Attempting to use helm chart to deploy tempo and having a bit of an issue out of the gate.
Helm Chart/Version: grafana/tempo-distributed/0.9.13
K8s version: 1.20
When deploying, the service “…gossip-ring” does not obtain a cluster IP.
tempo-tempo-distributed-compactor ClusterIP 100.135.62.159 3100/TCP 31h
tempo-tempo-distributed-distributor ClusterIP 100.132.32.149 3100/TCP,9095/TCP,55681/TCP,4317/TCP,55680/TCP 31h
tempo-tempo-distributed-gossip-ring ClusterIP None 7946/TCP 31h
tempo-tempo-distributed-ingester ClusterIP 100.130.141.136 3100/TCP,9095/TCP 31h
tempo-tempo-distributed-memcached ClusterIP 100.133.223.43 11211/TCP,9150/TCP 31h
tempo-tempo-distributed-querier ClusterIP 100.135.9.70 3100/TCP,9095/TCP 31h
tempo-tempo-distributed-query-frontend ClusterIP 100.128.68.231 3100/TCP,9095/TCP,16686/TCP,16687/TCP 31h
tempo-tempo-distributed-query-frontend-discovery ClusterIP None 3100/TCP,9095/TCP,16686/TCP,16687/TCP 31h
Also, compactor, ingester, querier, and distributor are crashloopbackoff due to error ```
failed: failed to create memberlist: Failed to get final advertise address: no private IP address found, and explicit IP not provided"
Similar issue to https://github.com/grafana/helm-charts/issues/328
I could try changing store to etcd but curious if anyone else has seen this behaviour.
It’s expected that the gossip ring service does not have a cluster IP, it’s a headless service so ClusterIP is explicitly set to none.
I haven’t seen that error before, can you share the memberlist part of Tempo’s configmap or the helm values you are using?
It should contain something like this:
I think this message is returned when the ring code cannot automatically find an IP. By default it looks for ips on network adapters named eth0 and en0. Can you check to see what your adapter name is?
This can be changed like this:
distributor:
ring:
instance_interface_names:
- <whatever your interface name is>
For the ingester:
ingester:
lifecycler:
ring:
interface_names:
- <whatever your interface name is>
tempo:
image:
# -- The Docker registry
registry: docker.io
# -- Docker image repository
repository: grafana/tempo
# -- Overrides the image tag whose default is the chart's appVersion
tag: null
pullPolicy: IfNotPresent
readinessProbe:
httpGet:
path: /ready
port: http
initialDelaySeconds: 30
timeoutSeconds: 1
# Configuration for the ingester
ingester:
# -- Number of replicas for the ingester
replicas: 1
# -- Grace period to allow the ingester to shutdown before it is killed. Especially for the ingestor,
# this must be increased. It must be long enough so ingesters can be gracefully shutdown flushing/transferring
# all data and to successfully leave the member ring on shutdown.
terminationGracePeriodSeconds: 300
# -- Affinity for ingester pods. Passed through `tpl` and, thus, to be configured as string
# @default -- Hard node and soft zone anti-affinity
# -- Annotations for ingester pods
affinity: |
podAntiAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
- labelSelector:
matchLabels:
{{- include "tempo.ingesterSelectorLabels" . | nindent 10 }}
topologyKey: kubernetes.io/hostname
preferredDuringSchedulingIgnoredDuringExecution:
- weight: 100
podAffinityTerm:
labelSelector:
matchLabels:
{{- include "tempo.ingesterSelectorLabels" . | nindent 12 }}
topologyKey: failure-domain.beta.kubernetes.io/zone
persistence:
# -- Enable creating PVCs which is required when using boltdb-shipper
enabled: false
# -- Size of persistent disk
size: 10Gi
# -- Storage class to be used.
# If defined, storageClassName: <storageClass>.
# If set to "-", storageClassName: "", which disables dynamic provisioning.
# If empty or set to null, no storageClassName spec is
# set, choosing the default provisioner (gp2 on AWS, standard on GKE, AWS, and OpenStack).
storageClass: null
distributor:
# -- Number of replicas for the distributor
replicas: 1
service:
# -- Type of service for the distributor
type: ClusterIP
# -- Grace period to allow the distributor to shutdown before it is killed
terminationGracePeriodSeconds: 30
# -- Annotations for distributor pods
# -- Affinity for distributor pods. Passed through `tpl` and, thus, to be configured as string
# @default -- Hard node and soft zone anti-affinity
affinity: |
podAntiAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
- labelSelector:
matchLabels:
{{- include "tempo.distributorSelectorLabels" . | nindent 10 }}
topologyKey: kubernetes.io/hostname
preferredDuringSchedulingIgnoredDuringExecution:
- weight: 100
podAffinityTerm:
labelSelector:
matchLabels:
{{- include "tempo.distributorSelectorLabels" . | nindent 12 }}
topologyKey: failure-domain.beta.kubernetes.io/zone
compactor:
# -- Grace period to allow the compactor to shutdown before it is killed
terminationGracePeriodSeconds: 30
# -- Annotations for compactor pods
# Configuration for the querier
querier:
# -- Number of replicas for the querier
replicas: 1
# -- Grace period to allow the querier to shutdown before it is killed
terminationGracePeriodSeconds: 30
# -- Affinity for querier pods. Passed through `tpl` and, thus, to be configured as string
# @default -- Hard node and soft zone anti-affinity
# -- Annotations for querier pods
affinity: |
podAntiAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
- labelSelector:
matchLabels:
{{- include "tempo.querierSelectorLabels" . | nindent 10 }}
topologyKey: kubernetes.io/hostname
preferredDuringSchedulingIgnoredDuringExecution:
- weight: 100
podAffinityTerm:
labelSelector:
matchLabels:
{{- include "tempo.querierSelectorLabels" . | nindent 12 }}
topologyKey: failure-domain.beta.kubernetes.io/zone
# Configuration for the query-frontend
queryFrontend:
query:
# -- Required for grafana version <7.5 for compatibility with jaeger-ui. Doesn't work on ARM arch
enabled: true
image:
# -- Docker image repository for the query-frontend image. Overrides `tempo.image.repository`
repository: grafana/tempo-query
config: |
backend: 127.0.0.1:3100
# -- Number of replicas for the query-frontend
replicas: 1
service:
# -- Annotations for queryFrontend service
annotations: {}
# -- Type of service for the queryFrontend
type: ClusterIP
# -- The name of the PriorityClass for query-frontend pods
# -- Grace period to allow the query-frontend to shutdown before it is killed
terminationGracePeriodSeconds: 30
# -- Annotations for query-frontend pods
# -- Affinity for query-frontend pods. Passed through `tpl` and, thus, to be configured as string
# @default -- Hard node and soft zone anti-affinity
affinity: |
podAntiAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
- labelSelector:
matchLabels:
{{- include "tempo.queryFrontendSelectorLabels" . | nindent 10 }}
topologyKey: kubernetes.io/hostname
preferredDuringSchedulingIgnoredDuringExecution:
- weight: 100
podAffinityTerm:
labelSelector:
matchLabels:
{{- include "tempo.queryFrontendSelectorLabels" . | nindent 12 }}
topologyKey: failure-domain.beta.kubernetes.io/zone
traces:
jaeger:
# -- Enable Tempo to ingest Jaeger GRPC traces
grpc: false
# -- Enable Tempo to ingest Jaeger Thrift Binary traces
thriftBinary: false
# -- Enable Tempo to ingest Jaeger Thrift Compact traces
thriftCompact: false
# -- Enable Tempo to ingest Jaeger Thrift HTTP traces
thriftHttp: false
# -- Enable Tempo to ingest Zipkin traces
zipkin: false
otlp:
# -- Enable Tempo to ingest Open Telementry HTTP traces
http: true
# -- Enable Tempo to ingest Open Telementry GRPC traces
grpc: true
# -- Enable Tempo to ingest Open Census traces
opencensus: false
config: |
multitenancy_enabled: false
compactor:
compaction:
block_retention: 48h
ring:
kvstore:
store: memberlist
distributor:
ring:
kvstore:
store: memberlist
receivers:
{{- if or (.Values.traces.jaeger.thriftCompact) (.Values.traces.jaeger.thriftBinary) (.Values.traces.jaeger.thriftHttp) (.Values.traces.jaeger.grpc) }}
jaeger:
protocols:
{{- if .Values.traces.jaeger.thriftCompact }}
thrift_compact:
endpoint: 0.0.0.0:6831
{{- end }}
{{- if .Values.traces.jaeger.thriftBinary }}
thrift_binary:
endpoint: 0.0.0.0:6832
{{- end }}
{{- if .Values.traces.jaeger.thriftHttp }}
thrift_http:
endpoint: 0.0.0.0:14268
{{- end }}
{{- if .Values.traces.jaeger.grpc }}
grpc:
endpoint: 0.0.0.0:14250
{{- end }}
{{- end }}
{{- if .Values.traces.zipkin}}
zipkin:
endpoint: 0.0.0.0:9411
{{- end }}
{{- if or (.Values.traces.otlp.http) (.Values.traces.otlp.grpc) }}
otlp:
protocols:
{{- if .Values.traces.otlp.http }}
http:
endpoint: 0.0.0.0:55681
{{- end }}
{{- if .Values.traces.otlp.grpc }}
grpc:
endpoint: 0.0.0.0:4317
{{- end }}
{{- end }}
{{- if .Values.traces.opencensus}}
opencensus:
endpoint: 0.0.0.0:55678
{{- end }}
querier:
frontend_worker:
frontend_address: {{ include "tempo.queryFrontendFullname" . }}-discovery:9095
ingester:
lifecycler:
ring:
replication_factor: 1
kvstore:
store: memberlist
tokens_file_path: /var/tempo/tokens.json
memberlist:
abort_if_cluster_join_fails: false
join_members:
- {{ include "tempo.fullname" . }}-gossip-ring
overrides:
per_tenant_override_config: /conf/overrides.yaml
server:
http_listen_port: 3100
storage:
trace:
backend: {{.Values.storage.trace.backend}}
{{- if eq .Values.storage.trace.backend "gcs"}}
gcs:
{{- toYaml .Values.storage.trace.gcs | nindent 6}}
{{- end}}
{{- if eq .Values.storage.trace.backend "s3"}}
s3:
{{- toYaml .Values.storage.trace.s3 | nindent 6}}
{{- end}}
{{- if eq .Values.storage.trace.backend "azure"}}
azure:
{{- toYaml .Values.storage.trace.azure | nindent 6}}
{{- end}}
blocklist_poll: 5m
local:
path: /var/tempo/traces
wal:
path: /var/tempo/wal
cache: memcached
memcached:
consistent_hash: true
host: {{ include "tempo.fullname" . }}-memcached
service: memcached-client
timeout: 500ms
# To configure a different storage backend instead of local storage:
# storage:
# trace:
# backend: azure
# azure:
# container-name:
# storage-account-name:
# storage-account-key:
# -- the supported storage backends are gcs, s3 and azure
# -- as specified in https://grafana.com/docs/tempo/latest/configuration/#storage
# Set ingestion overrides
overrides: |
overrides: {}
# memcached is for all of the Tempo pieces to coordinate with each other.
# you can use your self memcacherd by set enable: false and host + service
memcached:
# -- Specified whether the memcached cachce should be enabled
enabled: true
host: memcached
# Number of replicas for memchached
replicas: 1
service: memcached-client
# -- Memcached Docker image repository
repository: memcached
# -- Memcached Docker image tag
tag: 1.5.17-alpine
# -- Memcached Docker image pull policy
pullPolicy: IfNotPresent
# -- Additional CLI args for memcached
extraArgs: []
# -- Environment variables to add to memcached pods
extraEnv: []
# -- Environment variables from secrets or configmaps to add to memcached pods
extraEnvFrom: []
# -- Resource requests and limits for memcached
resources: {}
# -- Affinity for memcached pods. Passed through `tpl` and, thus, to be configured as string
# @default -- Hard node and soft zone anti-affinity
affinity: |
podAntiAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
- labelSelector:
matchLabels:
{{- include "tempo.memcachedSelectorLabels" . | nindent 10 }}
topologyKey: kubernetes.io/hostname
preferredDuringSchedulingIgnoredDuringExecution:
- weight: 100
podAffinityTerm:
labelSelector:
matchLabels:
{{- include "tempo.memcachedSelectorLabels" . | nindent 12 }}
topologyKey: failure-domain.beta.kubernetes.io/zone
memcachedExporter:
# -- Specifies whether the Memcached Exporter should be enabled
enabled: false
image:
# -- Memcached Exporter Docker image repository
repository: prom/memcached-exporter
# -- Memcached Exporter Docker image tag
tag: v0.8.0
# -- Memcached Exporter Docker image pull policy
pullPolicy: IfNotPresent
# -- Memcached Exporter resource requests and limits
resources: {}
# ServiceMonitor configuration
serviceMonitor:
# -- If enabled, ServiceMonitor resources for Prometheus Operator are created
enabled: false
# -- Alternative namespace for ServiceMonitor resources
namespace: null
# -- Namespace selector for ServiceMonitor resources
namespaceSelector: {}
# -- ServiceMonitor annotations
annotations: {}
# -- Additional ServiceMonitor labels
labels: {}
# -- ServiceMonitor scrape interval
interval: null
# -- ServiceMonitor scrape timeout in Go duration format (e.g. 15s)
scrapeTimeout: null
# -- ServiceMonitor will use http by default, but you can pick https as well
scheme: http
# -- ServiceMonitor will use these tlsConfig settings to make the health check requests
tlsConfig: null
Changing the pod’s IP range was not an option, so to solve this, we had to set the memberlist bind address in the config file explicitly to the pod’s ip address.
To do this, you need to:
create an environment variable with the pod’s ip (in values.yaml):