Hello! I couldn’t query traces from Tempo datasource (storage configured as s3 bucket). Traces avaliable only for 48h.
Why Tempo could`t query my tracees from s3 bucket? In our production infrastructure, we need to be able to request traces for 30 days.
Should I set block_retention: 30 days to solve the problem ?
My bucket structure:
block_retention controls the life of files in the s3 bucket. The compactor is responsible for this task. Set it to
block_retention: 720h for 30 days retention. Documentation
@mdisibio Hello! Could your help me please to understand this metrics: tempo_distributor_ingester_append_failures_total(Panel Failed batch sent to ingesters)?
I couldn’t figure out about what kind of failures this metrics contains?
I see it everytime when we sent traces to Tempo, and I also check this metrics: tempo_receiver_refused_spans and tempo_discarded_spans_total, there are both = 0.
Hi, the metric
tempo_distributor_ingester_append_failures_total means the distributor component had trouble forwarding traffic to the ingesters. More detail will be in the distributor logs, possibly the error
pusher failed to consume trace data. Based on your screenshot it looks like some traffic was ok because the bottom left panel
Ingester Traces Created has data.
Just want to ask a follow up question regarding this one. In the document, there is another parameter to control the retentions
# Optional. Duration to keep blocks that have been compacted elsewhere. Default is 1h.
It was quite obvious what this is used for. Could you please share some lights?
compacted_block_retention configures how long compacted blocks are kept in storage before deletion. When the compactor compacts blocks, it doesn’t delete them right away, but marks them as compacted. Compacted blocks are deleted afterwards asynchronously.
@mariorodriguez , thanks a lot for the reply. It is helpful to understand there is a separate garbage collection algorithm on this. But why keeping a retention on these blocks? Is it to try to protect the data when there could be transient failures/crashes?
Mainly to help with block list maintenance. Once a new block is created by compacting others, it can take a bit for all queriers to find it and update it in their block lists. Not deleting compacted blocks right away allows for queriers to fallback to those and adds resilience to the read path.