Tempo and istio sidecar / tls

Hello,

I’m having trouble getting tempo to build a cluster within our kubernetes environment with istio sidecar injection enabled.

Chart version is 1.7.6. Tempo version is 2.3.1

I’m just using the default memberlist from the helm chart at the moment. What I’ve observed so far:

  1. The cluster members seem to see each other
  2. The ingester ring shows all 3 instances as active
  3. otel collector is reporting a failure in sending traces to tempo

A small sample of the logs from the distributor pod

ts=2024-04-15T14:34:56.423346625Z caller=memberlist_logger.go:74 level=info msg="Suspect tempo-metrics-generator-6db5d6c957-qd995-a6dea97d has failed, no acks received"
ts=2024-04-15T14:35:17.149165621Z caller=memberlist_logger.go:74 level=warn msg="Refuting a suspect message (from: tempo-distributor-779bf4b8b5-rknjd-5b72c049)"
ts=2024-04-15T14:35:36.424108468Z caller=memberlist_logger.go:74 level=info msg="Suspect tempo-querier-56f8ddb954-rt7gx-343c0d65 has failed, no acks received"
ts=2024-04-15T14:36:16.424535696Z caller=memberlist_logger.go:74 level=info msg="Suspect tempo-ingester-0-12d36193 has failed, no acks received"
ts=2024-04-15T14:36:56.425139431Z caller=memberlist_logger.go:74 level=info msg="Suspect tempo-ingester-1-735e9f31 has failed, no acks received"
ts=2024-04-15T14:37:36.425567983Z caller=memberlist_logger.go:74 level=info msg="Suspect tempo-ingester-2-c1c8dfcc has failed, no acks received"
ts=2024-04-15T14:38:16.426850145Z caller=memberlist_logger.go:74 level=info msg="Suspect tempo-ingester-0-12d36193 has failed, no acks received"
ts=2024-04-15T14:38:17.197186645Z caller=memberlist_logger.go:74 level=warn msg="Refuting a suspect message (from: tempo-distributor-779bf4b8b5-rknjd-5b72c049)"
ts=2024-04-15T14:38:38.541117871Z caller=memberlist_logger.go:74 level=info msg="Marking tempo-querier-56f8ddb954-rt7gx-343c0d65 as failed, suspect timeout reached (1 peer confirmations)"
ts=2024-04-15T14:38:56.427816163Z caller=memberlist_logger.go:74 level=info msg="Suspect tempo-querier-56f8ddb954-rt7gx-343c0d65 has failed, no acks received"

At this point the tempo cluster refuses to go into a running state. The kubectl pods shows all are running and passing health checks:

❯ kubectl get pods -n tempo
NAME                                       READY   STATUS    RESTARTS      AGE
tempo-compactor-7f5df84f46-spvt2           2/2     Running   2 (40m ago)   40m
tempo-distributor-779bf4b8b5-rknjd         2/2     Running   2 (39m ago)   40m
tempo-ingester-0                           2/2     Running   1 (39m ago)   39m
tempo-ingester-1                           2/2     Running   1 (39m ago)   39m
tempo-ingester-2                           2/2     Running   1 (39m ago)   39m
tempo-memcached-0                          2/2     Running   0             39m
tempo-metrics-generator-6db5d6c957-qd995   2/2     Running   2 (39m ago)   39m
tempo-querier-56f8ddb954-rt7gx             2/2     Running   1 (39m ago)   39m
tempo-query-frontend-9bcd84c88-hw626       2/2     Running   1 (39m ago)   39m

Anyone have an idea of what is going on? I’m seeing this same behavior with loki and mimir.

The cluster is running on a kind single node kubernetes instance on my laptop so nothing at scale here.

❯ kind --version
kind version 0.22.0

❯ kubectl version
Client Version: v1.28.2
Kustomize Version: v5.0.4-0.20230601165947-6ce0bf390ce3
Server Version: v1.29.2

Hey, I’ve tried to install distributed-tempo on istio and encountered few problems, for now I have allegedly working example here but with a bit newer helm chart
github helm

So what’s different from default configuration:

  1. I use post-render for making sure that service ports have appProtocol
  2. In the same post-render script patch join_members gossip-ring dns request for using full fqdn name (not sure that it is needed)
    P.S.: yes it was redundant
  3. for helm values I changed part for rejoining the ring
memberlist:
  rejoin_interval: 60s
  dead_node_reclaim_time: 60s