Tempo Vulture fails to query Tempo 90/95% of the time

Hello, I tried to deploy Tempo & Tempo-vulture on Kubernetes but it seems like a high number of Tempo Vulture queries are failing (404), even though ingester seems OK.

Here is a bit of context:
Custom Helm Chart Templates:

storage:
  trace:
    backend: gcs
    gcs:
      bucket_name: grafana-tempo-storage
traces:
  jaeger:
    # -- Enable Tempo to ingest Jaeger GRPC traces
    grpc: true
    # -- Enable Tempo to ingest Jaeger Thrift Binary traces
    thriftBinary: false
    # -- Enable Tempo to ingest Jaeger Thrift Compact traces
    thriftCompact: false
    # -- Enable Tempo to ingest Jaeger Thrift HTTP traces
    thriftHttp: false
  # -- Enable Tempo to ingest Zipkin traces
  zipkin: false
  otlp:
    # -- Enable Tempo to ingest Open Telementry HTTP traces
    http: false
    # -- Enable Tempo to ingest Open Telementry GRPC traces
    grpc: true
  # -- Enable Tempo to ingest Open Census traces
  opencensus: false

For tempo Vulture:

tempoAddress:
  # -- the url towards your Tempo distributor, e.g. http://distributor
  push: http://tempo-tempo-distributed-distributor
  # -- the url towards your Tempo query-frontend, e.g. http://query-frontend:3100
  query: http://tempo-tempo-distributed-query-frontend:3100

Ingester metrics:

Tempo Vulture Logs:

E0827 09:37:13.720721       1 main.go:217] tempo url http://tempo-tempo-distributed-query-frontend:3100/api/traces/7f77b5ead87c970307da776e48f53d01
E0827 09:37:43.720688       1 main.go:217] tempo url http://tempo-tempo-distributed-query-frontend:3100/api/traces/312c83960a99aba1078f439f1339691f
E0827 09:37:43.733165       1 main.go:220] trace not found 312c83960a99aba1078f439f1339691f
E0827 09:37:43.733193       1 main.go:115] error querying Tempo trace not found
E0827 09:38:13.720435       1 main.go:217] tempo url http://tempo-tempo-distributed-query-frontend:3100/api/traces/2ec0d3ff939e977d5f3d81178d499b0b
E0827 09:38:13.738969       1 main.go:220] trace not found 2ec0d3ff939e977d5f3d81178d499b0b
E0827 09:38:13.738997       1 main.go:115] error querying Tempo trace not found

Query Frontend Logs:

level=info ts=2021-08-27T09:36:43.72913336Z caller=frontend.go:108 tenant=single-tenant method=GET traceID=5c085d48f9b70c94 url=/api/traces/6a4128219711c2d03bd036a55c15bf4a duration=2.174074ms response_size=0 status=404
level=info ts=2021-08-27T09:37:13.748972787Z caller=frontend.go:108 tenant=single-tenant method=GET traceID=6ff7dfa7785bc705 url=/api/traces/7f77b5ead87c970307da776e48f53d01 duration=19.823453ms response_size=31175 status=200
level=info ts=2021-08-27T09:37:43.730438653Z caller=frontend.go:108 tenant=single-tenant method=GET traceID=7dfd9078215a3d0a url=/api/traces/312c83960a99aba1078f439f1339691f duration=4.114527ms response_size=0 status=404
level=info ts=2021-08-27T09:38:13.736302197Z caller=frontend.go:108 tenant=single-tenant method=GET traceID=1f2c2de172362a17 url=/api/traces/2ec0d3ff939e977d5f3d81178d499b0b duration=3.890153ms response_size=0 status=404
level=info ts=2021-08-27T09:38:43.731232202Z caller=frontend.go:108 tenant=single-tenant method=GET traceID=430b8396d46bc714 url=/api/traces/1d1d191169694e534c925012e2057f8b duration=2.919309ms response_size=0 status=404

But once in a while, the querier succeeds to query it:

level=info ts=2021-08-27T07:59:43.74124059Z caller=frontend.go:108 tenant=single-tenant method=GET traceID=273151edb55a0b2a url=/api/traces/15004a4f0a6d3ff47b504d2aad6719ea duration=12.496063ms response_size=4221 status=200
level=info ts=2021-08-27T08:21:13.745419617Z caller=frontend.go:108 tenant=single-tenant method=GET traceID=58d9faad6d056415 url=/api/traces/40c7db016c97f94c6b3440cc84d1276b duration=16.035543ms response_size=19685 status=200
level=info ts=2021-08-27T08:21:43.730745961Z caller=frontend.go:108 tenant=single-tenant method=GET traceID=723bbd08e52005e8 url=/api/traces/740c1be16a79090b6fd7658a9d6cc9eb duration=3.221735ms response_size=0 status=404
level=info ts=2021-08-27T08:23:13.733750734Z caller=frontend.go:108 tenant=single-tenant method=GET traceID=4369729f9260c327 url=/api/traces/20041be668d2413169a69085b0af1dbe duration=3.215038ms response_size=0 status=404
level=info ts=2021-08-27T08:29:13.732850912Z caller=frontend.go:108 tenant=single-tenant method=GET traceID=64d61ed72a7e4ee5 url=/api/traces/4ed8edabad630fbb70acc146420076f4 duration=4.500166ms response_size=0 status=404
level=info ts=2021-08-27T08:46:13.731772424Z caller=frontend.go:108 tenant=single-tenant method=GET traceID=5326badbda65e07a url=/api/traces/30a3b3cd751cbc6f3bd64eaf11028f74 duration=3.12009ms response_size=0 status=404
level=info ts=2021-08-27T08:48:43.73743803Z caller=frontend.go:108 tenant=single-tenant method=GET traceID=6217c14e4d43bc40 url=/api/traces/2c95635565f805fe26c24d03880def8a duration=11.208868ms response_size=12860 status=200
level=info ts=2021-08-27T08:52:13.732720859Z caller=frontend.go:108 tenant=single-tenant method=GET traceID=6200951b87e460ef url=/api/traces/086cecfed1ce09b622f022967fa5cf2f duration=3.246412ms response_size=0 status=404
level=info ts=2021-08-27T09:01:44.836208631Z caller=frontend.go:108 tenant=single-tenant method=GET traceID=636faee5ae8ac280 url=/api/traces/66ccbf1cff7118945ecd5f0f2a4e8758 duration=15.751986ms response_size=19375 status=200
level=info ts=2021-08-27T09:10:43.784799736Z caller=frontend.go:108 tenant=single-tenant method=GET traceID=7a08465341a3ecb1 url=/api/traces/716e343c8d14ee9011712534887cae51 duration=57.297725ms response_size=37153 status=200
level=info ts=2021-08-27T09:11:43.837460553Z caller=frontend.go:108 tenant=single-tenant method=GET traceID=75ea476fe1f90c53 url=/api/traces/6f36ca7e6561cc7138bae5eaa601a4d6 duration=109.769497ms response_size=11489 status=200
level=info ts=2021-08-27T09:14:13.731200855Z caller=frontend.go:108 tenant=single-tenant method=GET traceID=3d23f042ae6fcdb3 url=/api/traces/33d9be3f677c8b1d066cdfc75ac999b4 duration=3.028284ms response_size=0 status=404
level=info ts=2021-08-27T09:21:43.732868039Z caller=frontend.go:108 tenant=single-tenant method=GET traceID=17f6c643bf11a2d7 url=/api/traces/6207fca5ac776c64020011603cf47c4a duration=3.197556ms response_size=0 status=404
level=info ts=2021-08-27T09:37:13.748972787Z caller=frontend.go:108 tenant=single-tenant method=GET traceID=6ff7dfa7785bc705 url=/api/traces/7f77b5ead87c970307da776e48f53d01 duration=19.823453ms response_size=31175 status=200

I have no clue where the issue could be, it you can provide some investigation tips, it’s be appreciated!

Your logs look different than ours, your vulture may be a bit out of date. Are you logging any lines for writing traces. I see a lot of these:

T=2021-08-27T15:13:09.525Z L=INFO M="sending trace" org_id=1 write_trace_id=64dcd96e2bac3068549e83f862b9d172 seed=1630077180
T=2021-08-27T15:13:19.526Z L=INFO M="sending trace" org_id=1 write_trace_id=6606fff5aa56988222802b7fd5b21901 seed=1630077190
T=2021-08-27T15:13:29.526Z L=INFO M="sending trace" org_id=1 write_trace_id=54b067ddeda8c7c2708801005d279488 seed=1630077200
T=2021-08-27T15:13:39.525Z L=INFO M="sending trace" org_id=1 write_trace_id=58ba146613e0eda32dcd2b88d65f240f seed=1630077210

in our logs. We just cut 1.1. Perhaps try the vulture from that release?

Here are our cli params:

      -prometheus-listen-address=:8080
      -tempo-push-url=http://distributor
      -tempo-query-url=http://query-frontend:3100
      -logtostderr=true
      -tempo-org-id=1
      -tempo-read-backoff-duration=5s
      -tempo-write-backoff-duration=10s

Hello @joeelliott, indeed I used the Helm Chart that deplous tempo-vulture:0.7.

I switched to v1.1, and finally managed to found the error, the compactor was missing a right for the bucket (storage.objects.get). I did not see it at first in logs, and wasn’t expecting that the compactor would block the full service, but it appears it can.

Thanks for your help,