Tempo in Dockerswarm - Disk read activity spikes

gmulcahy · February 8, 2024, 9:26am

I’m currently running Tempo on our Dockerswarm using the “local” storage backend. We are currently receiving between 500 and 1000 spans per second.

Every day or two, our host reports a massive spike in disk read activity, completely saturating the cpu load with iowait and eventually bringing down the swarm. If we stop sending to spans to Tempo, this problem goes away. If we move our instance of Tempo to a new swarm instance (and redirect the traffic to it), the issue follows with it. So, it’s definitely Tempo causing this behaviour. Tempo is limited in vCPU.

Sometimes, but seemingly not always, these spikes seem to line up with spikes in Tempo’s process_open_fds.

Is the solution to move to a different storage backend? Has anyone else experienced/solved something similar?

This is my current Tempo config:

target: all

server:
  http_listen_port: 3200
  log_level: info

distributor:
  receivers:
    otlp:
      protocols:
        http:
        grpc:

compactor:
  compaction:
    block_retention: 72h
    compacted_block_retention: 15m

ingester:
  max_block_duration: 15m

storage:
  trace:
    backend: local
    block:
      v2_encoding: zstd          
    wal:
      path: /tmp/tempo/wal             
      v2_encoding: snappy                 
    local:
      path: /tmp/tempo/blocks
    pool:
      max_workers: 100                 
      queue_depth: 10000

surajsidh · February 9, 2024, 3:11pm

Hi , we don’t run tempo with local backend so can’t help you much here because we don’t have experience with it.

my hunch is that it could be compaction? or retention, tempo will delete data older then block_retention, and it might be just that…

We recommend using object store as a backend if you plan to use it at any sizable scale.

if you are not in cloud, you can try something like MinIO.

system · February 8, 2025, 3:11pm

This topic was automatically closed 365 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
"WriteTo failed" issues with distributed Tempo in a Docker Swarm Grafana Tempo	7	4459	August 26, 2023
Data retention for 30 days in grafana tempo do not work Grafana Tempo tempo	0	196	August 5, 2024
Tempo multi-region architecture - query architecture Grafana Tempo	1	311	November 1, 2024
Cannot query traces from Tempo after 48h Grafana Tempo tempo	8	1469	April 4, 2024
Inconsistent / Empty Search Results, No Traces Returning After Some Time Grafana Tempo	1	163	June 25, 2025

Tempo in Dockerswarm - Disk read activity spikes

Related topics