I’m running Grafana Tempo via the tempo-distributed Helm chart, and I’m seeing far fewer traces than expected and traces appear to “disappear” after some time, even though I’ve configured long retention and persistence. I’d really appreciate help sanity‑checking my setup and understanding where traces might be getting lost.
Local Setup:
-
Tempo chart:
tempo-distributed2.0.0(fromhttps://grafana-community.github.io/helm-charts) -
Storage backend: Tempo
storage.trace.backend = local(default), with PVCs on the ingester
Config YAML:
empo-distributed:
enabled: true
traces:
otlp:
http:
enabled: true
grpc:
enabled: true
metricsGenerator:
enabled: true
ingester:
persistence:
enabled: true
size: 50Gi
enableStatefulSetRecreationForSizeChange: true
compactor:
config:
compaction:
block_retention: 720h # 30 days
compacted_block_retention: 1h
compaction_window: 1h
compaction_cycle: 30s
My Java app uses the OTel Java agent with sampler = always_on, sending OTLP to the collector. In Grafana → Explore → Tempo → Drilldown I run:
-
Query: {resource.service.name=“java-app”} (or {} with no filter)
-
Time range: up to 7 days
-
Limit: 5000
But I only ever see ~120 traces, all from today, even though the app handles many more requests and the generated Tempo config shows block_retention: 720h (not the default 48h).
Questions:
-
Does this config look sufficient to actually retain traces for 30 days with the local backend + ingester PVCs, or am I missing any Tempo overrides / per‑tenant retention settings?
-
With sampler = always_on, where would you look next for trace drops (Tempo or OTel Collector metrics/logs to check)?
-
Are there any Tempo/Grafana query defaults or limits that could explain consistently seeing only ~120 traces despite a high limit and wide time range?
Any pointers on what to check next would be greatly appreciated.