I have a 3 node Tempo cluster setup (v1.0.1, via binaries) and I can’t figure out why the ingester ring will not form. I have each server behind a load balancer and when I visit /compactor/ring, I see all 3 nodes, /ingester/ring only displays 1 server.
My configs are all identical. I have consul setup for memberlist. I also see traces in the s3 bucket, so everything looks ok.
memberlist: bind_port: 7946 join_members: - tempo.service.consul:7946 overrides: ingestion_rate_strategy: global server: http_listen_port: 3200 http_listen_address: 0.0.0.0 grpc_listen_port: 9095 grpc_listen_address: 0.0.0.0 log_level: debug storage: trace: backend: s3 s3: bucket: <bucket> endpoint: <endpoint> distributor: ring: instance_interface_names: - ens5 kvstore: store: memberlist log_received_traces: true receivers: zipkin: ingester: lifecycler: interface_names: - ens5 ring: replication_factor: 3 compactor: ring: instance_interface_names: - ens5 kvstore: store: memberlist querier: frontend_worker: frontend_address: tempo.service.consul:9095 query_frontend: query_shards: 3 instance_interface_names: - ens5
My assumption is that if the replication_factor is set to 3, and the ingester ring is formed correctly, any tempo server should be able to handle the query.
What I am finding is older queries will return correctly (as I assume they are searching s3), but new queries will randomly return a 404, which I believe is caused by the ingester ring not forming, and therefore the query is not hitting the server that ingested/replicated the trace.
The /memberlist api is showing both compactor and distributer in “KV Store”, and all 3 servers are healthy under “Memberlist Cluster Members”, so I don’t think this is a firewall/connection issue.
The only debug logs of note are:
caller=mock.go:149 msg="Get - deadline exceeded" key=collectors/ring
Any help is appreciated.