[ Ingester] Save the search data permanently

Hello! I have a tempo-distributed installation.

We use the search mechanism:

search_enabled: true

Now I encounter a problem when, after restarting Ingester, the data in /var/tempo/wal/search/ is reset. Is there anything I can do to change this behavior?

multitenancy_enabled: false
search_enabled: true

server:
  http_listen_port: 3100

distributor:
  receivers:
    jaeger:
      protocols:
        grpc:
          endpoint: 0.0.0.0:14250
        thrift_binary:
          endpoint: 0.0.0.0:6832
        thrift_compact:
          endpoint: 0.0.0.0:6831
        thrift_http:
          endpoint: 0.0.0.0:14268

querier:
  frontend_worker:
    frontend_address: {{ include "tempo.queryFrontendFullname" . }}:9095

ingester:
  max_block_duration: 5m
  complete_block_timeout: 72h
  lifecycler:
    ring:
      replication_factor: 2

memberlist:
  abort_if_cluster_join_fails: false
  bind_port: 7946
  join_members:
    - {{ include "tempo.fullname" . }}-gossip-ring

compactor:
  compaction:
    block_retention: 720h
    compacted_block_retention: 1h
    compaction_window: 1h
  ring:
    kvstore:
      store: memberlist

storage:
  trace:
    backend: s3
    block:
      bloom_filter_false_positive: .05
      index_downsample_bytes: 1000
      encoding: zstd
    s3:
      bucket: tempo
      endpoint: xxxxxxxxxxxx
      access_key: xxxxxxxxxxxx
      secret_key: xxxxxxxxxxxx
    pool:
      queue_depth: 10000
      max_workers: 100
    wal:
      path: /var/tempo/wal
      encoding: snappy

overrides:
  max_bytes_per_trace: 5000000

Hi, Tempo v1.2+ should preserve the search data. I think what you are noticing is expected behavior with WAL replay after an ingester restarts. Tempo writes traces to the write ahead logs in /var/tempo/wal/<uuid file>, and matching search data in /var/tempo/wal/search/<uuid file>. Every few minutes the WAL files are moved and resaved as backend blocks in /var/tempo/wal/blocks/<tenant>/<uuid> (timing is controlled by ingester max_block_duration and max_block_bytes config options). On restart, existing WAL files are immediately replayed and moved, instead of waiting. Therefore it will look like /search/ is reset, but the previous data was actually moved to /blocks/.

Are you seeing missing search results?

We can follow some steps to confirm the data is still present after a restart.

  1. While ingester is running, list the contents of /var/tempo/wal/search/ and note a block’s uuid.
  2. Restart the ingester and wait a few moments.
  3. Verify the same uuid is now present in /var/tempo/wal/blocks/<tenant>/<uuid> and it contains search-related files.
  4. Check ingester logs during restart for the following messages:
    4.1 "beginning wal replay"
    4.2 "beginning replay" file=<uuid>....
    4.3 "replay complete" file=<uuid>...
    4.4 "wal replay complete"

Thanks for the reply!
I ran diagnostics on your advice:

caller=ingester.go:330 msg="beginning wal replay"                                                                                                                                                                               
msg="beginning replay" file=ebbb68bc-8025-4492-b33c-ccf001f57989:single-tenant:v2:snappy:v1 size=11611                                                                                                        
msg="replay complete" file=ebbb68bc-8025-4492-b33c-ccf001f57989:single-tenant:v2:snappy:v1 duration=1.101357ms                                                                                                
msg="beginning replay" file=ebbb68bc-8025-4492-b33c-ccf001f57989:single-tenant:v2:none: size=16786                                                                                                  
msg="replay complete" file=ebbb68bc-8025-4492-b33c-ccf001f57989:single-tenant:v2:none: duration=1.121087ms
/ # ls -l /var/tempo/wal/blocks/single-tenant | grep ebbb68bc-8025-4492-b33c-ccf001f57989
drwxr-xr-x    2 root     root          4096 Nov 18 13:36 ebbb68bc-8025-4492-b33c-ccf001f57989

After that, I checked the trace search in jaueger-ui (tempo-query) and it was almost all empty :worried:

One trace managed to hit the tempo after the restart. Before that, there were about 20 of them in the search field.

The search is performed only on new blocks, which are located in /var/tempo/wal/search/
Everything in /var/tempo/wal/blocks/<tenant>/<uuid> is ignored by Tempo

Thanks for the info and screenshot. It looks like the issue is that the service drop down is not fully populated after a restart. That is true, Tempo 1.2.1 only populates the drop downs from data received since last restart. If your traffic volume is infrequent, then this can present a problem. We have a PR in progress to improve this, and it will populate from all data.

Despite the UI, the search API will work as expected. Try curling http://<tempo>:3200/api/search?service.name=<name>. It will find traces for the service even if it is not in the drop down. Tempo-query (jaeger-query) won’t allow search without a service, but Grafana will. A work around would be to use the experimental search UI in Grafana 8.2+ (8.3 recommended), and type service.name=<name> in the Tags field, leaving service drop down empty. Some instructions to enable this feature in Grafana are here.

Thanks for the reply!
Your solution seems to work :slightly_smiling_face: