Over which attributes is TraceQL query performant?

Traces can have many attributes with very high cardinality. I assume most of them are not indexed.

From what I could see in architecture, traces are even stored inside blocks only with bloom filters to find correct blocks for a given trace ID.

TraceQL let’s us however search by any attribute at all. I wasn’t able to find any information regarding how this is done, if the querier has to go trace by trace or there are at least some indexes.

Loki solves this by having tags by which loglines are indexed and the rest of information is stored inside the line itself, usually either as JSON or in logfmt. There we have to however be explicit that we want to parse all attributes because the search then can become costly. Is similar solution used or planned for Tempo?

Tempo doesn’t do indexing of attributes, but rather uses a columnar storage format to support the requirements you’re referencing regarding search. We use Apache Parquet.

What this allows Tempo is to pull individual columns (mapped as different attributes) from storage when searching with TraceQL, reading a lot less data than with a more common row-wise model. So, for the query { resource.namespace = "prod" }, Tempo will only pull a single column resource.namespace (roughly, it’s more complex than just that).

Columnar storage is very well suited of OLAP workloads—very large reads of a few subset of columns (ie. TraceQL queries). This model has an impact on trace retrieval, but combined with other techniques, such as the bloom filters you mention, offers a good balance. Tempo can store very high cardinality data without a direct impact in storage cost (index size) or read performance.

We’ve recently published a design proposal for a new storage format, which is an iteration of previous Parquet-based versions. I think it could interest you. It goes over most of the concepts mentioned above.