Tempo ram usage for 6k spans per hour

Hello ,
I have set Tempo single binary with azure blob storage pod to 700mb limit for a relatively low volume testing environment and the pod restarts quite often .

I have changed it to 1G for now and it looks like it doesn’t restart anymore. But I don’t know how to calculate required ram based on the number of spans .
In the last 6 hours I had 39708 spans collected by tempo (~6600 spans per hour).

Is there a formula or an intuitive way of reasoning about ram usage ? .I think even with 1000 spans per hour I would still get restarts .

Why does grafana tempo need so much ram for low volume?

There are a number of factors that could be driving up your resource usage that we should take a look at. I will say that a volume of 6600 spans per hour is quite low so there should be no issues.

It would be interesting to know your bytes received per second. A single span can vary quite widely size. Check tempo_distributor_bytes_received_total.

Another factor that could be driving up resource usage is total trace size. Have you adjusted overrides.max_bytes_per_trace? Are you seeing any log messages about max trace size?

To drive down the memory used I would recommend adjusting the following settings:

ingester:
  flush_check_period: 10s
  trace_idle_period: 10s

The default for both of these settings is 10s. Maybe try them both at 5s? This will cause Tempo to keep fewer traces in memory and flush them more aggressively to disk.

Thank you for these hints ,
I am using tempo helm chart with very basics config ,only Azure storage . I think service monitor helm value does not work fully . I use the following tempo helm values file :

replicas: 1
tempo:
  resources:
   requests:
     cpu: 400m
     memory: 1Gi
   limits:
     cpu: 400m
     memory: 1Gi
  storage:
    trace:
      backend: azure
      azure:
        container-name: ${tempo_container_name}
        storage-account-name: ${tempo_storageAccount_name}
        storage-account-key: ${tempo_storageAccount_key}

  searchEnabled: true

  serviceMonitor:
    enabled: true
    interval: 30s

  # only GRPC OTLP receiver
  receivers:
    otlp:
      protocols:
        grpc:
          endpoint: "0.0.0.0:4317"

But in exchange I have the metrics from the source of the spans of tempo , Otel Collector from the same 6h time range of first post and it looks lie a max burst of 50 spans :

Also it is interesting memory usage jumps of 400mb for the tempo container (same time range) , I also overlaid with sent spans *10 (to see them better in the chart ) :

I think it comes down to those 400mb spikes from tempo .
I will try to see tempo_distributor_bytes_received_total with a manual ServiceMonitor applied .

I’m going to guess those spikes are occurring during compaction. The logs will clearly indicate everytime a compaction starts. Perhaps you could confirm they correlate with the spikes?

Looks like compaction cycle is started every second, but sometimes I get an extra message level=info ts=2022-04-11T05:22:00.334767445Z caller=trace.go:68 msg="size of search data exceeded max search bytes limit" maxSearchBytes=5000 discardedBytes=752 :

This is an installation with just the azure backend configuration.

The log lines you are seeing are simply Tempo looking for compaction jobs. This is a little spammy in 1.3. The frequency has been reduced in 1.4 and made configurable.

Lines like this indicate that an actual compaction job is taking place:

level=info ts=2022-04-12T13:04:25.952205846Z caller=compactor.go:64 msg="starting compaction cycle" tenantID=1 offset=0
level=info ts=2022-04-12T13:04:25.953805114Z caller=compactor.go:77 msg="Compacting hash" hashString=1-1-5499099
level=info ts=2022-04-12T13:04:25.953852311Z caller=compactor.go:115 msg="compacting block" block="&{Version:v2 BlockID:4a729ad5-08e1-411e-9dc6-f0397caa7381 MinID:[0 0 0 0 0 0 0 0 0 0 1 5 20 189 244 221] MaxID:[255 106 230 112 83 21 21 175 18 24 31 22 140 18 12 232] TenantID:1 StartTime:2022-04-12 02:12:28 +0000 UTC EndTime:2022-04-12 02:18:59 +0000 UTC TotalObjects:1599461 Size:4722129072 CompactionLevel:1 Encoding:lz4-1M IndexPageSize:256000 TotalRecords:14232 DataEncoding:v2 BloomShardCount:21}"

My guess is that the reason that Tempo needs the RAM it does even with a log ingest rate is that the compaction cycle is taking all of the tiny blocks created by the slow stream of ingestion and creating very large blocks.

1 Like

Hi, so can the RAM be reduced? Thanks

I have about 400 mb of blob storage used for default time range which I think it is 10days . Tempo is deployed as single binary and I set the limit to 1.5g and it still gets OOMKilled event. Tempo could even easily store those 400 mb of traces in ram and not get over 1.5g of process memory usage . I don fully understand the high memory usage.

I have about 400 mb of blob storage used for default time range which I think it is 10days . Tempo is deployed as single binary and I set the limit to 1.5g and it still gets OOMKilled event. Tempo could even easily store those 400 mb of traces in ram and not get over 1.5g of process memory usage . I don fully understand the high memory usage.

The issue occurs during the marshalling/unmarshalling of proto. proto.Unmarshal can allocate 100s of MBs even when unmarshalling an object in the 10s of MBs. This extreme pressure on the GC can cause OOMs.

Two things worth mentioning:

  1. Tempo 1.4 was just released today. This release does include some memory improvements during compaction, but we are still struggling with the same basic problem as above.
    https://grafana.com/blog/2022/05/02/new-in-grafana-tempo-1.4-introducing-the-metrics-
    generator/
    Release v1.4.0 · grafana/tempo · GitHub

  2. Tempo is moving toward a Parquet block format because of this very issue and how it impacts both search and compaction. This parquet format will store traces in object storage in a columnar format to speed up search as well as improve the efficiency of compaction. Also, there are ton of off the shelf tools that can use Parquet in object storage directly which we hope people leverage when using Tempo.

Thank you for your patience on this issue. To help us continue to improve:

  1. Please try Tempo 1.4 and report any results.
  2. Pass on any details about large traces in your backend to help us understand the cases that are causing the most problems.
1 Like

I would like to try it but I can’t enable it in the single binary helm chart … I have created a pull request to enable custom settings : [tempo-single-binary] Add possibility for custom configuration of tempo settings via `extraConfig` new helm value by ovidiubuligan · Pull Request #1335 · grafana/helm-charts · GitHub

After almost 1 year with single binary mode trafic grew a lot and it is working fine with 4G ram.
Our current span throughput is on average about 2k spans per second (measured from our Otel Collector with sum (rate(otelcol_exporter_sent_spans[1h]))) . 4G looks to be enough for it to cover spikes. CPU is 0.4 .

We are upgrading now to tempo 2.0 distributed version . I will probably post some resource consumption numbers after some time .

2 Likes

I was running Tempo in monolithic mode with 2gb but kept getting OOM errors. I increased memory to 4gb and the OOM errors stopped, but I was wondering what could be the spikes in memory usage that you can see in the graph below.

They seem to occur approximately every 25 minutes.

Is it the compactor?

Any way I can reduce those spikes?

I would correlate this with your logs, but my guess is that this spike occurs when the ingester cuts a block.

To reduce this you could cut blocks more frequently by reducing max_block_duration or perhaps try to increase your trace_idle_period to reduce fragmented traces. Both of these settings are under the ingester: block.

2 Likes

Thanks, I’ll try it and get back with the results.

Reducing max_block_duration worked, as you can see below.

Thanks again, @joeelliott !

Hi @ovidiubuligan , I am also facing same issue, did you got any solution for this?

We have switched to the microservices version and our workload is much much larger . Our grafana tempo deployment is about 20 Gigs in microservices memory combined. We have 10 k spans per minute !
Memory snapshot of tempo for 10k spans per minuge:

2 Likes

does more replica of ingester help in tempo-distributed? or do they just do the same job instead of load balance