Tempo ingester ring not forming

Hi all,

I have a 3 node Tempo cluster setup (v1.0.1, via binaries) and I can’t figure out why the ingester ring will not form. I have each server behind a load balancer and when I visit /compactor/ring, I see all 3 nodes, /ingester/ring only displays 1 server.

My configs are all identical. I have consul setup for memberlist. I also see traces in the s3 bucket, so everything looks ok.

memberlist:
  bind_port: 7946
  join_members:
  - tempo.service.consul:7946

overrides:
    ingestion_rate_strategy: global

server:
  http_listen_port: 3200
  http_listen_address: 0.0.0.0
  grpc_listen_port: 9095
  grpc_listen_address: 0.0.0.0
  log_level: debug

storage:
  trace:
    backend: s3
    s3:
      bucket: <bucket>
      endpoint: <endpoint>

distributor:
  ring:
    instance_interface_names:
     - ens5
    kvstore:
      store: memberlist
  log_received_traces: true
  receivers:
    zipkin:

ingester:
  lifecycler:
    interface_names:
    - ens5
    ring:
      replication_factor: 3

compactor:
  ring:
    instance_interface_names:
    - ens5
    kvstore:
      store: memberlist

querier:
  frontend_worker:
    frontend_address: tempo.service.consul:9095

query_frontend:
  query_shards: 3
  instance_interface_names:
   - ens5

My assumption is that if the replication_factor is set to 3, and the ingester ring is formed correctly, any tempo server should be able to handle the query.

What I am finding is older queries will return correctly (as I assume they are searching s3), but new queries will randomly return a 404, which I believe is caused by the ingester ring not forming, and therefore the query is not hitting the server that ingested/replicated the trace.

The /memberlist api is showing both compactor and distributer in “KV Store”, and all 3 servers are healthy under “Memberlist Cluster Members”, so I don’t think this is a firewall/connection issue.

The only debug logs of note are:

caller=mock.go:149 msg="Get - deadline exceeded" key=collectors/ring

Any help is appreciated.

Currently HA mode only works with the distributed components (ingester/distributor/querier/query-frontend).

We’ve discussed single binary scaleable mode a bit here:

It’s possible just cleaning up the mentioned lines would enable single binary scaleable mode, but we’ve never really experimented with it either. Based on your description it sounds like that might be it.

If you want to pursue running Tempo in the traditional distributed mode here are some resources:

I see, so my ingester issue with single binary mode is due to the defaults being set to inmemory and replication_factor “1” and not checking for overrides.

I’ll take a look at traditional distributed mode, and experiment with allowing overrides for single binary mode.

Thanks!

No problem. If you’re looking to do a high volume Tempo install I would recommend the distributed mode anyway as it allows more flexibility in scaling the different pieces.

I feel like the HA scaleable mode is an in between option. Maybe in the 100k spans/second range. Dunno :slight_smile: