Tempo ingester ring not forming

dpires · August 19, 2021, 3:58pm

Hi all,

I have a 3 node Tempo cluster setup (v1.0.1, via binaries) and I can’t figure out why the ingester ring will not form. I have each server behind a load balancer and when I visit /compactor/ring, I see all 3 nodes, /ingester/ring only displays 1 server.

My configs are all identical. I have consul setup for memberlist. I also see traces in the s3 bucket, so everything looks ok.

memberlist:
  bind_port: 7946
  join_members:
  - tempo.service.consul:7946

overrides:
    ingestion_rate_strategy: global

server:
  http_listen_port: 3200
  http_listen_address: 0.0.0.0
  grpc_listen_port: 9095
  grpc_listen_address: 0.0.0.0
  log_level: debug

storage:
  trace:
    backend: s3
    s3:
      bucket: <bucket>
      endpoint: <endpoint>

distributor:
  ring:
    instance_interface_names:
     - ens5
    kvstore:
      store: memberlist
  log_received_traces: true
  receivers:
    zipkin:

ingester:
  lifecycler:
    interface_names:
    - ens5
    ring:
      replication_factor: 3

compactor:
  ring:
    instance_interface_names:
    - ens5
    kvstore:
      store: memberlist

querier:
  frontend_worker:
    frontend_address: tempo.service.consul:9095

query_frontend:
  query_shards: 3
  instance_interface_names:
   - ens5

My assumption is that if the replication_factor is set to 3, and the ingester ring is formed correctly, any tempo server should be able to handle the query.

What I am finding is older queries will return correctly (as I assume they are searching s3), but new queries will randomly return a 404, which I believe is caused by the ingester ring not forming, and therefore the query is not hitting the server that ingested/replicated the trace.

The /memberlist api is showing both compactor and distributer in “KV Store”, and all 3 servers are healthy under “Memberlist Cluster Members”, so I don’t think this is a firewall/connection issue.

The only debug logs of note are:

caller=mock.go:149 msg="Get - deadline exceeded" key=collectors/ring

Any help is appreciated.

joeelliott · August 19, 2021, 6:48pm

Currently HA mode only works with the distributed components (ingester/distributor/querier/query-frontend).

We’ve discussed single binary scaleable mode a bit here:

It’s possible just cleaning up the mentioned lines would enable single binary scaleable mode, but we’ve never really experimented with it either. Based on your description it sounds like that might be it.

If you want to pursue running Tempo in the traditional distributed mode here are some resources:

dpires · August 19, 2021, 7:06pm

I see, so my ingester issue with single binary mode is due to the defaults being set to inmemory and replication_factor “1” and not checking for overrides.

I’ll take a look at traditional distributed mode, and experiment with allowing overrides for single binary mode.

Thanks!

joeelliott · August 19, 2021, 7:15pm

No problem. If you’re looking to do a high volume Tempo install I would recommend the distributed mode anyway as it allows more flexibility in scaling the different pieces.

I feel like the HA scaleable mode is an in between option. Maybe in the 100k spans/second range. Dunno

system · August 19, 2022, 7:16pm

This topic was automatically closed 365 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
How to create a gossip ring for tempo Grafana Tempo	8	4022	October 4, 2022
Distributor/ingester issue consuming traces Grafana Tempo	6	3498	March 29, 2022
Tempo and istio sidecar / tls Grafana Tempo	1	329	April 27, 2024
This instance doesn't use memberlist Grafana Tempo	0	29	July 22, 2024
Tempo multi-region architecture - query architecture Grafana Tempo	1	280	November 1, 2024

Tempo ingester ring not forming

Related topics