Traces diappearing after about an hour. I have no clue why

Hi,

I’m fresh to Tempo and still evaluating things.I did install it in our k8s cluster using the bitnami chart from charts/bitnami/grafana-tempo at master · bitnami/charts · GitHub.

I left every config value at its default, beside enabling otlp.

The cluster is running fine, apps are sending data, I can see loglines with traces in Loki and refer to them in Tempo.

But every traceid is only visible for about an hour, then I get a 404 from the querier. All log files look fine, no errors. The only thing that corelates is an hourly job in the ingester:

level=info ts=2022-07-13T10:59:39.233248218Z caller=flush.go:168 msg="head block cut. enqueueing flush op" userid=single-tenant block=4b287f37-fb5b-4f8f-9b12-f4b4047fd4ca
level=info ts=2022-07-13T10:59:45.551680166Z caller=flush.go:244 msg="completing block" userid=single-tenant blockID=4b287f37-fb5b-4f8f-9b12-f4b4047fd4ca
level=info ts=2022-07-13T10:59:46.296392306Z caller=flush.go:251 msg="block completed" userid=single-tenant blockID=4b287f37-fb5b-4f8f-9b12-f4b4047fd4ca duration=744.723962ms
level=info ts=2022-07-13T10:59:46.301025265Z caller=flush.go:300 msg="flushing block" userid=single-tenant block=4b287f37-fb5b-4f8f-9b12-f4b4047fd4ca

When I log into the ingester pod and look into /bitnami/grafana-tempo/data it is still full of data from the past few days.

Here is the config file generated from the helm chart:

multitenancy_enabled: false
search_enabled: false
metrics_generator_enabled: false
compactor:
  compaction:
    block_retention: 48h
  ring:
    kvstore:
      store: memberlist
distributor:
  ring:
    kvstore:
      store: memberlist
  receivers:
    jaeger:
      protocols:
        thrift_http:
          endpoint: 0.0.0.0:14268
        grpc:
          endpoint: 0.0.0.0:14250
    otlp:
      protocols:
        grpc:
          endpoint: 0.0.0.0:4317
querier:
  frontend_worker:
    frontend_address: prometheus-stack-grafana-tempo-query-frontend-headless:9095
ingester:
  lifecycler:
    ring:
      replication_factor: 1
      kvstore:
        store: memberlist
    tokens_file_path: /bitnami/grafana-tempo/data/tokens.json
memberlist:
  abort_if_cluster_join_fails: false
  join_members:
    - prometheus-stack-grafana-tempo-gossip-ring
overrides:
  per_tenant_override_config: /bitnami/grafana-tempo/conf/overrides.yaml
server:
  http_listen_port: 3100
storage:
  trace:
    backend: local
    blocklist_poll: 5m
    local:
      path: /bitnami/grafana-tempo/data/traces
    wal:
      path: /bitnami/grafana-tempo/data/wal
    cache: memcached
    memcached:
      consistent_hash: true
      host: prometheus-stack-memcached
      service: memcache
      timeout: 500ms

As I said I am fairly new to tempo, so I have no idea where to look next. block_retention: 48hgave the impression I should be able to query traceIds for 48h.

Hi, I’m not familiar with that chart, but let’s see if we can get this fixed. It looks like Tempo is installed in microservices mode, where each component runs as separate pods. And it looks like Tempo’s backend data storage is a local disk. This section:

trace:
    backend: local
    local:
      path: /bitnami/grafana-tempo/data/traces

Is /bitnami/grafana-tempo/data/traces a shared storage space by all pods or a per-pod folder? In the latter case, I think what is happening is that each pod is only seeing its own files. When installed in microservices mode, Tempo needs to be pointed at a shared storage backend like AWS S3, Google Cloud Storage, or Azure Blob Storage. Traces are written to blocks by the ingester, and then the ingester flushes the blocks to the shared storage. The querier then reads the blocks from the shared storage.

But yes, of course. It is not a shared storage space. Now that you point this out I feel stupid because it is so obvious :slight_smile:

I think I will switch to the monolithic helm chart from grafana for my tests, this seems a little less complex for learning tempo.

Thank you for your help.

Hey it’s no problem at all! The monolithic chart is definitely easier to get started with, and if anything comes up please reach out again.

I am facing the same issue. I deployed tempo distributed. if its a shared volume issue, how is it managing in the 1st hour to get traces?
below is the tempo part of my values.yaml. Chart.yaml specifies the dependency as tempo-distributed 0.21.8

tempo:
  distributor:
    resources:
        requests:
          cpu: 50m
          memory: 1.5Gi
        limits:
          cpu: 100m
          memory: 3Gi
  ingester:
    complete_block_timeout: 672h
  search:
    enabled: true
  traces:
    otlp:
      grpc:
        enabled: true
  overrides: |
    overrides:
      "*":
        max_traces_per_user: 0
        ingestion_rate_limit_bytes: 50000000
        max_search_bytes_per_trace: 0
        ingestion_burst_size_bytes: 80000000

When we search for traces we will search both the backend and the ingesters. This way we can also find traces that have just been received by Tempo.

After a block has reached a certain size or age, the ingester will write it to the backend (= the flush operation). Once it’s in the backend, the queriers should be able to find it. Because it might take a while for a block to be detected, the ingesters will also hold on to completed blocks for a little bit longer. This way the traces are still searchable, even if the queriers haven’t found this new block yet.
This duration is configurable with complete_block_timeout.

If you are running Tempo in distributed mode, you should configure a shared backend. In your config you haven’t specified a backend, so it will use local. This means blocks are stored on the local disk, making it impossible for the queriers to find them.
You should configure this section: helm-charts/values.yaml at tempo-distributed-0.21.8 · grafana/helm-charts · GitHub

The value of complete_block_timeout is also very high (4 weeks). This means blocks that have been flushed to the backend will only be removed after 4 weeks. While this ensures they remain searchable, you will be using a lot of local disk (= expensive) and searching the blocks will not scale.

We are requested to use local as backend, hence the settings weren’t changed. From this issue 1223 I understand all we can do is increase the complete_block_timeout.
Even when complete_block_timeout is set to 4 weeks why are the blocks getting flushed in just 1 hour? I understand its expensive I can reduce it to may be 5 days.
Here search.enabled, doesnt have any impact? I hope I understood you correctly.

PS: the above configuration used to retain traces for more than a week since may. Only in the last 2 weeks we are seeing this issue. We used to be on Chart version 0.17.2, even after upgrading to 0.21.8 the retention is only for an hour.

Blocks being flushed is controlled by two parameters: max_block_bytes and max_block_duration (by default 1G and 1 hour). So if a block reaches a certain size or age, the block will be cut anyway and flushed to the backend. After a block has been flushed it is considered ‘completed’ and the complete_block_timeout will control how long this block stays around.

What is the version of Tempo you were running before?

Are you searching for traces or only doing a trace ID lookup? We did some changes in the query-frontend config, namely the parameters query_backend_after and query_ingesters_until might be important.

Application version 1.4.0 chart version: 0.17.2

We have derived field set in loki which redirects to tempo’s query using traceId.

Just noticed, that my config might be wrong complete_block_timeout is directly under ingester. But according to values.yaml it should be ingester.config.complete_block_timeout. I will test and update if that was the issue

This worked

  ingester:
    config:
      complete_block_timeout: 120h

Thankyou!

1 Like

I think that this ingester config is not the proper way to address your problem.

Its main responsibility is to feed the data into storage + facilitate the query of recently ingested traces. I don’t know if it rebuilds any cache after the restart, but I suspect that it doesn’t - in that case, you’d want your backend search to be configured properly.

It’s definitely not ideal to use Tempo distributed with a local backend, but this setup is the best you can do for now given:

We could improve this experience a bit (see #1223) but it’s not a mode Tempo is designed for.