Hello ,
I have set Tempo single binary with azure blob storage pod to 700mb limit for a relatively low volume testing environment and the pod restarts quite often .
I have changed it to 1G for now and it looks like it doesn’t restart anymore. But I don’t know how to calculate required ram based on the number of spans .
In the last 6 hours I had 39708 spans collected by tempo (~6600 spans per hour).
Is there a formula or an intuitive way of reasoning about ram usage ? .I think even with 1000 spans per hour I would still get restarts .
Why does grafana tempo need so much ram for low volume?
There are a number of factors that could be driving up your resource usage that we should take a look at. I will say that a volume of 6600 spans per hour is quite low so there should be no issues.
It would be interesting to know your bytes received per second. A single span can vary quite widely size. Check tempo_distributor_bytes_received_total.
Another factor that could be driving up resource usage is total trace size. Have you adjusted overrides.max_bytes_per_trace? Are you seeing any log messages about max trace size?
To drive down the memory used I would recommend adjusting the following settings:
The default for both of these settings is 10s. Maybe try them both at 5s? This will cause Tempo to keep fewer traces in memory and flush them more aggressively to disk.
Thank you for these hints ,
I am using tempo helm chart with very basics config ,only Azure storage . I think service monitor helm value does not work fully . I use the following tempo helm values file :
But in exchange I have the metrics from the source of the spans of tempo , Otel Collector from the same 6h time range of first post and it looks lie a max burst of 50 spans :
Also it is interesting memory usage jumps of 400mb for the tempo container (same time range) , I also overlaid with sent spans *10 (to see them better in the chart ) :
I think it comes down to those 400mb spikes from tempo .
I will try to see tempo_distributor_bytes_received_total with a manual ServiceMonitor applied .
I’m going to guess those spikes are occurring during compaction. The logs will clearly indicate everytime a compaction starts. Perhaps you could confirm they correlate with the spikes?
Looks like compaction cycle is started every second, but sometimes I get an extra message level=info ts=2022-04-11T05:22:00.334767445Z caller=trace.go:68 msg="size of search data exceeded max search bytes limit" maxSearchBytes=5000 discardedBytes=752 :
The log lines you are seeing are simply Tempo looking for compaction jobs. This is a little spammy in 1.3. The frequency has been reduced in 1.4 and made configurable.
Lines like this indicate that an actual compaction job is taking place:
My guess is that the reason that Tempo needs the RAM it does even with a log ingest rate is that the compaction cycle is taking all of the tiny blocks created by the slow stream of ingestion and creating very large blocks.
I have about 400 mb of blob storage used for default time range which I think it is 10days . Tempo is deployed as single binary and I set the limit to 1.5g and it still gets OOMKilled event. Tempo could even easily store those 400 mb of traces in ram and not get over 1.5g of process memory usage . I don fully understand the high memory usage.
I have about 400 mb of blob storage used for default time range which I think it is 10days . Tempo is deployed as single binary and I set the limit to 1.5g and it still gets OOMKilled event. Tempo could even easily store those 400 mb of traces in ram and not get over 1.5g of process memory usage . I don fully understand the high memory usage.
The issue occurs during the marshalling/unmarshalling of proto. proto.Unmarshal can allocate 100s of MBs even when unmarshalling an object in the 10s of MBs. This extreme pressure on the GC can cause OOMs.
Tempo is moving toward a Parquet block format because of this very issue and how it impacts both search and compaction. This parquet format will store traces in object storage in a columnar format to speed up search as well as improve the efficiency of compaction. Also, there are ton of off the shelf tools that can use Parquet in object storage directly which we hope people leverage when using Tempo.
Thank you for your patience on this issue. To help us continue to improve:
Please try Tempo 1.4 and report any results.
Pass on any details about large traces in your backend to help us understand the cases that are causing the most problems.
After almost 1 year with single binary mode trafic grew a lot and it is working fine with 4G ram.
Our current span throughput is on average about 2k spans per second (measured from our Otel Collector with sum (rate(otelcol_exporter_sent_spans[1h]))) . 4G looks to be enough for it to cover spikes. CPU is 0.4 .
We are upgrading now to tempo 2.0 distributed version . I will probably post some resource consumption numbers after some time .
I was running Tempo in monolithic mode with 2gb but kept getting OOM errors. I increased memory to 4gb and the OOM errors stopped, but I was wondering what could be the spikes in memory usage that you can see in the graph below.
I would correlate this with your logs, but my guess is that this spike occurs when the ingester cuts a block.
To reduce this you could cut blocks more frequently by reducing max_block_duration or perhaps try to increase your trace_idle_period to reduce fragmented traces. Both of these settings are under the ingester: block.
We have switched to the microservices version and our workload is much much larger . Our grafana tempo deployment is about 20 Gigs in microservices memory combined. We have 10 k spans per minute !
Memory snapshot of tempo for 10k spans per minuge: