Tempo unable to handle 32 digits long TraceIDs?

ragamx · May 11, 2021, 4:00pm

Hi
I’ve been working with Tempo for a few weeks now trying to get a solid POC. I am able to create my own spans using a zipkin tracer (i.e opentracing.StartSpan(“myspan”)) and retrieve them using the Grafana UI that works. In our system, written in Golang, we are using Gin-gonic framework, this has a middleware option for tracing.

“github.com/gin-gonic/gin”
“github.com/opentracing-contrib/go-gin/ginhttp”

tracer := CreateMyZipkinTracer() // Here I create my tracer
r := gin.New()
r.Use(ginhttp.Middleware(tracer))

So I decided to use it and pass my tracer to it, send a POST/GET request and when extracting the TraceID from the span crated by Gin-Gonic middleware (in the same way that I do when creating my own spans), I noticed that the TraceID was 32 digits long and not 16 as they usually are (i.e. 0ba9b72392e0f36a ), so I wasn’t able to retrieve that TraceID from the UI.

Actually the TraceID is defined as a 128 bit number internally stored as 2x uint64 (high & low)

github.com

openzipkin/zipkin-go/blob/815bc64e1a6031d5a51c88b07092d34c20aaeb61/model/traceid.go#L24




package model


import (
	"fmt"
	"strconv"
)


// TraceID is a 128 bit number internally stored as 2x uint64 (high & low).
// In case of 64 bit traceIDs, the value can be found in Low.
type TraceID struct {
	High uint64
	Low  uint64
}


// Empty returns if TraceID has zero value.
func (t TraceID) Empty() bool {
	return t.Low == 0 && t.High == 0
}


// String outputs the 128-bit traceID as hex string.

I have tried to use the lower part, high part and both but Tempo doesn’t retrieve the TraceID in the UI.

Why is Tempo not able to process these spans? Is there any configuration change to be made for this?

Thanks.

joeelliott · May 11, 2021, 5:00pm

I believe 0ba9b72392e0f36a is a 64 bit trace id. There are 16 hexadecimal values representing 4 bits each. 16 * 4 = 64. Regardless, if you search for a trace id that is shorter than 128 bits (32 characters) Tempo will leftpad with 0s and it should work.

Do you have any logs that may be helpful? I would also check metrics such as tempo_distributor_spans_received_total to confirm that Tempo is receiving spans.

ragamx · May 13, 2021, 12:35am

Hi
Sorry for the delay my dev env got toast yesterday and couldn’t do much.
I didn’t see any errors on the tempo logs, I am using the single binary version 0.7.0. The tempo_distributor_spans_received_total is greater than zero

I am using Azure as storage but it shouldn’t matter as I am able to retrieve 16 digits long TraceIDs but not 32 digits.

I found this on the logs that caught my attention

I wonder where that first trace_id came from, of course I tried it but nothing.

I could check any other thing and try any suggestion.

Thanks.

joeelliott · May 13, 2021, 1:06pm

traceID=6df6df65... is Tempo logging its own trace id for that transaction. It will not be in Tempo unless you’re having Tempo trace itself.

There really shouldn’t be any difference to Tempo between 128bit and 64bit trace ids. We use both internally without problem. Are some applications creating 64bit and some 128bit? Perhaps focus on the configuration differences between the two?

Maybe we can add a debug log that just writes every trace id that Tempo receives to help confirm what is occurring internally.

ragamx · May 13, 2021, 5:28pm

I think you have a point regarding creating 64bit vs 128bits TraceID’s for some reason when I get the TraceID from the Gin-Gonic context using our production code it comes out as 32 digits. But when I do it, using the same packages and versions, in a standalone app, the traceID comes out as 16 digits. It doesn’t make sense. I am investigating that.

However regardless of this Tempo should be able to handle the 32 digits TraceID. You mention “internally”, have you actually successfully queried a 32 digits TraceID from the UI? There might be an issue there as the /metrics show the ingestor/distributor is getting the spans. Is there a way to dump all current traces from Tempo for analysis?

I am going to keep looking into the Gin-Gonic side. However it would be useful to have that extra debug statements in Tempo, not sure how to go about that.

Thanks.

joeelliott · May 13, 2021, 8:08pm

Yup, I can query both 128 bit (32 digit) and 64 bit (16 digit) trace ids in Grafana.

Just submitted this:

After this is merged we will be able to turn on trace id logging which will allow you to directly see the trace ids that Tempo is receiving.

ragamx · May 14, 2021, 12:19am

Looking forward to try it. Hopefully it doesn’t take too long to get it merged.

Thanks

joeelliott · May 14, 2021, 1:17pm

This has been merged: Added trace id logging by joe-elliott · Pull Request #700 · grafana/tempo · GitHub

You can test it out using image:
grafana/tempo:91b41f1b

It requires setting the flag:

distributor:
  log_received_traces: true

--distributor.log-received-traces=true

Note that this logs all trace ids and spans ids so is for debugging only. Also this logs those ids before rate limiting and pushing to the ingesters. So we should keep an eye on our logs to make sure we’re not getting failures.

ragamx · May 14, 2021, 6:24pm

Hi

I think the 32 digits TraceIDs are not being received/accepted by the distributor.

Sending the request

These are Tempo logs, it doesn’t seem to receive it. I checked those traces and they belong to my service liveness proof. Somehow they are making it to Tempo and 16 digits long. You can check the time stamp.

This is the code getting span, TraceID and logging it.

I didn’t see any relevant error on the Tempo logs. No sign of Tempo receiving it and discarding it.
Still scratching my head.

Thanks

joeelliott · May 14, 2021, 7:50pm

That’s really weird, the trace id is set to the span id. Notice that on all the Tempo log lines the span id and trace id are equal.

Is there anything between Tempo and your application? It looks like you’re using zipkin for tracing. I suppose there’s the possibility that the OTel translation between zipkin => otel is doing something funky.

EDIT:
We don’t use zipkin at Grafana, but I found a zipkin trace in the wild and the trace id was equal to the root span id. I wonder if this is something specific to the zipkin instrumentation library?

joeelliott · May 14, 2021, 8:18pm

Another bit of weirdness. In this issue: What should a trace id and span id look like? · Issue #5 · openzipkin/b3-propagation · GitHub a comment indicates:

Traditionally, the start of a trace (root span) has the same value for trace id and span id. The root span has no parent id. Its child would share a trace id with its parent, but provision a new span id.

suggesting that the span id and trace id do match with 64 bit ids. I suppose the question is:

Why does your tracer report a full 128 bit trace id and at what point does the trace id get replaced with the root span id?

ragamx · May 14, 2021, 8:38pm

I think having TraceID=SpanID is expected according to that comment and what I’ve seen on my testing so far. Initially the root spanid matches the traceid, as soon as I create new spans the spanid changes, I have seen that. What you see in the logs is the liveness proof which has only one span, therefore the id’s are equal. I don’t think the ids get replaced further down. It might be a bit weird but not exactly a problem.

If I ignore that gin-gonic span and create a new one, everything works and I get a 16 digits span.

span := opentracing.StartSpan("myspan")

What I don’t know is why gin-gonic middleware is creating a 32 digits TraceID (which is valid any ways according to the specs) and why Tempo is not accepting it. That was my original concern.

Why the distributor is not detecting/receiving it? Is there some kind of filter before the span gets to the distributor?

joeelliott · May 19, 2021, 9:22pm

Apologies for the late reply. I finally got around to setting up a little go/zipkin test of my own and found the same result you did. If I use

span := opentracing.StartSpan("myspan")

The the span id matches the trace id and everything works as expected.

What I don’t know is why gin-gonic middleware is creating a 32 digits TraceID (which is valid any ways according to the specs) and why Tempo is not accepting it. That was my original concern.

I did some quick testing by curling 2 zipkins span into Tempo. One has a 64 bit trace id and one has 128 bit trace id.

$ curl -H POST -d '[{"timestamp":1621457256581420,"duration":13,"traceId":"03dbddea2d2d9ac0","id":"03dbddea2d2d9ac0","name":"test!","localEndpoint":{"serviceName":"myservice","ipv4":"66.96.163.136","port":80}}]' http://localhost:9411
$ curl -H POST -d '[{"timestamp":1621457256581420,"duration":13,"traceId":"01234567890123456789012345678901","id":"03dbddea2d2d9ac0","name":"test!","localEndpoint":{"serviceName":"myservice","ipv4":"66.96.163.136","port":80}}]' http://localhost:9411

Tempo logged the trace/span ids:

level=info ts=2021-05-19T21:12:32.583187446Z caller=distributor.go:409 msg=received spanid=03dbddea2d2d9ac0 traceid=000000000000000003dbddea2d2d9ac0
level=info ts=2021-05-19T21:13:56.289703957Z caller=distributor.go:409 msg=received spanid=03dbddea2d2d9ac0 traceid=01234567890123456789012345678901

So it does appear that Tempo is ingesting 128bit (32 character) trace ids correctly for zipkin traces.

If you would like to take this further I think you would need to point your zipkin tracer at something like RequestBin (simple to get going with a docker-compose). Requestbin will simply record the raw json that is POSTed and we can see the exact messages that are being sent to Tempo.

I used it when I ran my local test and it will show you something like this:

ragamx · May 20, 2021, 1:30am

You are correct Tempo accepts 32 and 16 digits long TraceIDs, how do I know? because I initially was using opentracing but due to the fact that it doesn’t seem to support cloudevents I was forced to switch to opencensus, this tracer creates 32 digits long TraceIDs. I took one of them and it was processed by Tempo. I think the problem that I observed may be caused by this code, it is doing something funky:

github.com

opentracing-contrib/go-gin/blob/1dd2273433a443ba9474c30d667150e23ad82151/ginhttp/server.go#L67

    
      
          // to set the span's http.url tag. Can be used to change the default
          // http.url tag, eg to redact sensitive information.
          func MWURLTagFunc(f func(u *url.URL) string) MWOption {
          	return func(options *mwOptions) {
          		options.urlTagFunc = f
          	}
          }
          
          
// Middleware is a gin native version of the equivalent middleware in:
          //   https://github.com/opentracing-contrib/go-stdlib/
          func Middleware(tr opentracing.Tracer, options ...MWOption) gin.HandlerFunc {
          	opts := mwOptions{
          		opNameFunc: func(r *http.Request) string {
          			return "HTTP " + r.Method
          		},
          		spanObserver: func(span opentracing.Span, r *http.Request) {},
          		urlTagFunc: func(u *url.URL) string {
          			return u.String()
          		},
          	}
          	for _, opt := range options {

Anyways, I really appreciate the time you spent helping me out with this issue which is certainly
weird.

On an unrelated issue, I am trying to set up in my k8s cluster Loki and Tempo, Loki listens on port 3100 so does Tempo. What is the easies way to change Tempo’s port, I am using the distributed version and helm? I have used:

--set server.http_listen_port="3101"

but doesn’t seem to do the trick. Of course I downloaded the Chart and made the proper changes in several places but not sure if that’s the way to go.

Thanks.

PS Just for the sake of curiosity I will use the RequestBin to find out what the real problem is, but for the moment it can wait a bit.

joeelliott · May 20, 2021, 5:44pm

I hack on the helm charts some, but we don’t use them internally so I can’t say I’m an expert. The tempo helm chart seems to support this var:

tempo.server.httpListenPort

Config docs: helm-charts/charts/tempo at main · grafana/helm-charts · GitHub

Unfortunately the tempo-distributed helm chart only seems to support overriding the config as a whole:

system · May 20, 2022, 5:44pm

This topic was automatically closed 365 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Sending zipkin traces to Tempo Grafana Tempo	4	1591	November 14, 2023
Tempo cant find the traceid Grafana Tempo loki , datasource	2	1181	November 19, 2024
Traces diappearing after about an hour. I have no clue why Grafana Tempo	12	2533	August 4, 2023
Working with Grafana Tempo locally with dockerimages Grafana Tempo	7	3701	January 21, 2022
How to query a 500MB trace? Grafana Tempo	8	3471	March 9, 2022

Tempo unable to handle 32 digits long TraceIDs?

Related topics