Metrics queries for traces like for logs in Loki

cbos · September 28, 2023, 12:17pm

The question I have is about the types of queries for traces.
For Loki with logQL you have 2 types of queries, the log queries to find any log entry you are looking for and the metric queries like described here: Metric queries | Grafana Loki documentation

I have used logQL quite a lot to find all kind of patterns based on log files I had.
The graphs really helped to identity problems and solve them.

For traces I am actually looking for a similar type of queries.
I know how to use TraceQL to find certain traces.
But now I like to know how often this happened overtime and if we have that more often at certain moments.

How can I achieve that? Or is that simply not possible yet?

mdisibio · September 28, 2023, 12:54pm

Hi cbos, the short answer is that TraceQL metric queries aren’t possible yet. But the long answer is they are already in progress and there are alternatives.

The Metrics-Generator component of Tempo produces metrics-from-traces and remote writes them to a prometheus-compatible backend. This solves many use cases like checking the error rate or latency of a span or service over a time window. The series labels can be customized for your environment (like including http.url, region, kubernetes cluster, etc) There are two processors Span Metrics and Service Graph

There is also an experimental Metrics Summary API that returns data on-demand and builds on TraceQL. Since it generates data on-demand it does not require a prometheus-compatible backend or have the same cardinality restrictions.

cbos · September 28, 2023, 1:12pm

@mdisibio
Thanks for your quick answer.

Span metrics I know, but that does not solve my questions.

Currently I am working in a micro services landscape were a lot of services are connected.
With TraceQL I can find individual traces were a certain interaction patterns takes place with some retries between some of the services.

{.service.name!="A"} >> {.service.name="A" && span.peer.service="B"}

I like to know how often that happens between these services over time.
A count_over_time( …traceQL… [5m]) like query can help in that case.

Even in case I can extract this particular case with span metrics, that requires a change in deployment and only helps to find out this pattern in future.
While I am doing some ad-hoc analysis.

I will see if the Metrics Summary API can help me here.

system · September 27, 2024, 1:13pm

This topic was automatically closed 365 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Loki 0.4.0 querying Metrics From Logs Grafana Loki loki	8	4838	May 15, 2021
Grafana Alerts From Loki Logs Grafana Loki	1	533	April 9, 2022
Query Tempo by span id with TraceQL Grafana Tempo tempo	1	472	November 1, 2024
Grafana Alerts Based on Loki Logs Grafana	1	252	March 24, 2022
Graph 2 Tempo/TraceQL queries on the same panel (alt: monitor trace duration) Grafana Tempo	0	31	November 19, 2024

Metrics queries for traces like for logs in Loki

Related topics