High number of spans in trace doesn't open in Grafana UI

I have a trace with over 30k spans . I have tuned the Tempo backend to allow >100mb traces:

    query_timeout: 60s
    max_concurrent_queries: 5
      query_timeout: 60s
       # 127mb
        max_recv_msg_size: 133554432
        max_send_msg_size: 133554432

      # 127mb
      grpc_server_max_recv_msg_size: 1.278576e+08
      grpc_server_max_send_msg_size: 1.278576e+08

aslo overrides:

    metrics_generator_processors: ['service-graphs', 'span-metrics']
    max_search_bytes_per_trace: 10000
    max_bytes_per_trace: 133554432 

The result of oppening a big trace is browser forever loading with cpu crunching :

I see that Jaeger UI supports 80k spans in a trace , what ui limits does Grafana tempo UI have ?:

I would say there is no real hard limit, but it scales with you local machine performance.

Try more powerful machine for your browser and wait few minutes. I guess all processing/parsing is done in the browser, so it depends on the machine where it is running. It is unusual trace size.

1 Like

What would be the best way to reduce the size of the trace or make it debuggable ? All the traces come from the same process . There is a known span (DoBigJob) that generates about 30k child spans.

Since most of the 30 spans are successful ,I was thinking about a few options bit I am not sure which will work with tempo .

  1. when the big processing starts , It generates new trace contexts (new traces) for batches of 5k traces . Add Span links from the BigJob span to the separate traces of 5k spans.

  2. Since most of the 30k spans are successfull and are not really that of interest only sample 10% of the spans created by BigJob and sample all errored out spans . It may be done with some sampler strategy that knows about BigJob span . Can this sampling be done at the OtelCollector/Tempo level ?

  3. Bonus question . Is it possible to split big traces/spans into smaller ones with span links automatically from Otel Collector/Tempo ? (Probably it won’t be possible without tail based sampling)

For future people wondering it I managed to load the big trace .
I tried it in firefox and it loaded after 20 mins of 100% cpu . It loaded only for a few seconds until I get an An unexpected error happened

Error: Minified React error #185; visit https://reactjs.org/docs/error-decoder.html?invariant=185 for the full message or use the non-minified dev environment for full errors and additional helpful warnings.

I would filter out those successful/not useful spans with filter processor - opentelemetry-collector-contrib/processor/filterprocessor at main · open-telemetry/opentelemetry-collector-contrib · GitHub Then I guess you will have much smaller trace in the backend and in the UI for the processing.

You tried Firefox - try also Chrome based browser - it has different JS engine, which may handle your huge trace better (or not).

1 Like

Sorry for asking so many questions but , is there a way to query for traces and show them in a table format? This will surely not reach react limits . (I have i7 11850 32G ram)

Since tempo2.0 will be parquet based. Will there a way to query 30k rows from a trace maybe through a Parquet Data Source in grafana ?

:man_shrugging: I have never ever seen traces in table - you will loose span structure. Anyway, I don’t see Parquet Data Source in Grafana so it is not good idea.

Maybe try another trace backend e.g. ClickHouse, which may allow you to query in the table row format (one row = one span). I guess 30k rows in the Grafana table can be also a challenge for your browser.

1 Like

This topic was automatically closed 365 days after the last reply. New replies are no longer allowed.