The question I have is about the types of queries for traces.
For Loki with logQL you have 2 types of queries, the log queries to find any log entry you are looking for and the metric queries like described here: Metric queries | Grafana Loki documentation
I have used logQL quite a lot to find all kind of patterns based on log files I had.
The graphs really helped to identity problems and solve them.
For traces I am actually looking for a similar type of queries.
I know how to use TraceQL to find certain traces.
But now I like to know how often this happened overtime and if we have that more often at certain moments.
How can I achieve that? Or is that simply not possible yet?
Hi cbos, the short answer is that TraceQL metric queries aren’t possible yet. But the long answer is they are already in progress and there are alternatives.
The Metrics-Generator component of Tempo produces metrics-from-traces and remote writes them to a prometheus-compatible backend. This solves many use cases like checking the error rate or latency of a span or service over a time window. The series labels can be customized for your environment (like including http.url, region, kubernetes cluster, etc) There are two processors Span Metrics and Service Graph
There is also an experimental Metrics Summary API that returns data on-demand and builds on TraceQL. Since it generates data on-demand it does not require a prometheus-compatible backend or have the same cardinality restrictions.
Span metrics I know, but that does not solve my questions.
Currently I am working in a micro services landscape were a lot of services are connected.
With TraceQL I can find individual traces were a certain interaction patterns takes place with some retries between some of the services.
I like to know how often that happens between these services over time.
A count_over_time( …traceQL… [5m]) like query can help in that case.
Even in case I can extract this particular case with span metrics, that requires a change in deployment and only helps to find out this pattern in future.
While I am doing some ad-hoc analysis.
I will see if the Metrics Summary API can help me here.