Hi all,
I have a 3 node Tempo cluster setup (v1.0.1, via binaries) and I can’t figure out why the ingester ring will not form. I have each server behind a load balancer and when I visit /compactor/ring, I see all 3 nodes, /ingester/ring only displays 1 server.
My configs are all identical. I have consul setup for memberlist. I also see traces in the s3 bucket, so everything looks ok.
memberlist:
bind_port: 7946
join_members:
- tempo.service.consul:7946
overrides:
ingestion_rate_strategy: global
server:
http_listen_port: 3200
http_listen_address: 0.0.0.0
grpc_listen_port: 9095
grpc_listen_address: 0.0.0.0
log_level: debug
storage:
trace:
backend: s3
s3:
bucket: <bucket>
endpoint: <endpoint>
distributor:
ring:
instance_interface_names:
- ens5
kvstore:
store: memberlist
log_received_traces: true
receivers:
zipkin:
ingester:
lifecycler:
interface_names:
- ens5
ring:
replication_factor: 3
compactor:
ring:
instance_interface_names:
- ens5
kvstore:
store: memberlist
querier:
frontend_worker:
frontend_address: tempo.service.consul:9095
query_frontend:
query_shards: 3
instance_interface_names:
- ens5
My assumption is that if the replication_factor is set to 3, and the ingester ring is formed correctly, any tempo server should be able to handle the query.
What I am finding is older queries will return correctly (as I assume they are searching s3), but new queries will randomly return a 404, which I believe is caused by the ingester ring not forming, and therefore the query is not hitting the server that ingested/replicated the trace.
The /memberlist api is showing both compactor and distributer in “KV Store”, and all 3 servers are healthy under “Memberlist Cluster Members”, so I don’t think this is a firewall/connection issue.
The only debug logs of note are:
caller=mock.go:149 msg="Get - deadline exceeded" key=collectors/ring
Any help is appreciated.