Tempo distributed helm-chart configuration

sambaran2010 · October 25, 2023, 12:03pm

Hi there,

We are trying to use tempo which is configured using tempo-distributed helm chart. Though the helm release is installed I could see some of the errors and we could not ingest traces.

distributor logs:

ts=2023-10-25T11:44:27.123569974Z caller=memberlist_logger.go:74 level=info msg="Marking grafana-tempo-distributor-58cc77f749-gfs2t-e678e20c as failed, suspect timeout reached (2 peer confirmations)"
level=warn ts=2023-10-25T11:44:30.786848372Z caller=tcp_transport.go:254 component="memberlist TCPTransport" msg="failed to read message type" err=EOF remote=10.68.3.57:38880
level=warn ts=2023-10-25T11:44:33.121271085Z caller=tcp_transport.go:438 component="memberlist TCPTransport" msg="WriteTo failed" addr=10.68.3.59:7946 err="dial tcp 10.68.3.59:7946: i/o timeout"
level=warn ts=2023-10-25T11:44:38.122348155Z caller=tcp_transport.go:438 component="memberlist TCPTransport" msg="WriteTo failed" addr=10.68.3.57:7946 err="dial tcp 10.68.3.57:7946: i/o timeout"
level=warn ts=2023-10-25T11:44:43.123416271Z caller=tcp_transport.go:438 component="memberlist TCPTransport" msg="WriteTo failed" addr=10.68.3.59:7946 err="dial tcp 10.68.3.59:7946: i/o timeout"
level=warn ts=2023-10-25T11:44:56.122244891Z caller=tcp_transport.go:438 component="memberlist TCPTransport" msg="WriteTo failed" addr=10.68.3.59:7946 err="dial tcp 10.68.3.59:7946: i/o timeout"

Can someone please guide what might be wrong in the configuration?

apiVersion: helm.toolkit.fluxcd.io/v2beta1
kind: HelmRelease
metadata:
  name: grafana-tempo
  namespace: tempo
spec:
  interval: 30m
  chart:
    spec:
      chart: tempo-distributed
      version: "~1"
      sourceRef:
        kind: HelmRepository
        name: grafana-charts
        namespace: tempo
  values:
    serviceAccount:
      name: sample-service
    multitenancy_enabled: false
    compactor:
      compaction:
        block_retention: 48h
      ring:
        kvstore:
          store: memberlist
    distributor:
      receivers:
        otlp:
          protocols:
            grpc:
    ingester:
      lifecycler:
        ring:
          replication_factor: 1
      persistence:
        size: 25Gi
        storageClass:
          storageClassName: regional-storage
    traces:
      otlp:
        grpc:
          enabled: true
    memberlist:
      abort_if_cluster_join_fails: false
    server:
      http_listen_port: 3100
    storage:
      trace:
        backend: gcs
        gcs:
          bucket_name: xxxxxxx
      pool:
        queue_depth: 2000
      wal:
        path: /var/tempo/wal
      memcached:
        consistent_hash: true
        host: xxx
        service: memcached-client
        timeout: 500ms

mdisibio · October 25, 2023, 1:04pm

This error indicates that memberlist gossip is failing. The pods gossip ring state to each other on this port, and the distributors become aware of the ingesters so they can route traffic.

Here are a few ideas:

Check the cluster networking layer, is there a rule that prevents this traffic?
Browse to localhost:3200/memberlist on some distributor and ingester pods - is any pod seeing any other pods? That could help determine if the issue is isolated to a specific set of pods or all of them.
[Purely for testing!] You can actually hard-code the ring members and this will check if the issue is with connection or discovery. The tempo config yaml looks like this (I’m not sure how exactly this is exposed in the helm chart):

memberlist:
  join_members:
  - dns+<pod>:7946       # repeat for each pod
  - dns+ingester-0:7946  # example

system · October 24, 2024, 1:04pm

This topic was automatically closed 365 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Tempo Distributed Helm chart, Increasing counter for "tempo_ distributor_ ingester_ append_f ailures_ total" metric Grafana Tempo helm , tempo	2	563	November 15, 2024
Otel collector with otlp exporter to Tempo issue Grafana Tempo	4	2761	September 19, 2024
What is the Tempo datasource for Grafana after installing the tempo-distributed helm chart? Configuration observability	1	867	February 1, 2022
What happened to tempo older helm charts? Grafana Tempo	5	1828	August 10, 2022
Grafana tempo with helm-distributed always returns different results if I query in grafana Grafana Tempo tempo	3	465	February 13, 2025

Tempo distributed helm-chart configuration

Related topics