Tempo unable to handle 32 digits long TraceIDs?

Hi
I’ve been working with Tempo for a few weeks now trying to get a solid POC. I am able to create my own spans using a zipkin tracer (i.e opentracing.StartSpan(“myspan”)) and retrieve them using the Grafana UI that works. In our system, written in Golang, we are using Gin-gonic framework, this has a middleware option for tracing.

github.com/gin-gonic/gin
github.com/opentracing-contrib/go-gin/ginhttp

tracer := CreateMyZipkinTracer() // Here I create my tracer
r := gin.New()
r.Use(ginhttp.Middleware(tracer))

So I decided to use it and pass my tracer to it, send a POST/GET request and when extracting the TraceID from the span crated by Gin-Gonic middleware (in the same way that I do when creating my own spans), I noticed that the TraceID was 32 digits long and not 16 as they usually are (i.e. 0ba9b72392e0f36a ), so I wasn’t able to retrieve that TraceID from the UI.

Actually the TraceID is defined as a 128 bit number internally stored as 2x uint64 (high & low)

I have tried to use the lower part, high part and both but Tempo doesn’t retrieve the TraceID in the UI.

Why is Tempo not able to process these spans? Is there any configuration change to be made for this?

Thanks.

I believe 0ba9b72392e0f36a is a 64 bit trace id. There are 16 hexadecimal values representing 4 bits each. 16 * 4 = 64. Regardless, if you search for a trace id that is shorter than 128 bits (32 characters) Tempo will leftpad with 0s and it should work.

Do you have any logs that may be helpful? I would also check metrics such as tempo_distributor_spans_received_total to confirm that Tempo is receiving spans.

Hi
Sorry for the delay my dev env got toast yesterday and couldn’t do much.
I didn’t see any errors on the tempo logs, I am using the single binary version 0.7.0. The tempo_distributor_spans_received_total is greater than zero

I am using Azure as storage but it shouldn’t matter as I am able to retrieve 16 digits long TraceIDs but not 32 digits.

I found this on the logs that caught my attention

I wonder where that first trace_id came from, of course I tried it but nothing.

I could check any other thing and try any suggestion.

Thanks.

traceID=6df6df65... is Tempo logging its own trace id for that transaction. It will not be in Tempo unless you’re having Tempo trace itself.

There really shouldn’t be any difference to Tempo between 128bit and 64bit trace ids. We use both internally without problem. Are some applications creating 64bit and some 128bit? Perhaps focus on the configuration differences between the two?

Maybe we can add a debug log that just writes every trace id that Tempo receives to help confirm what is occurring internally.

I think you have a point regarding creating 64bit vs 128bits TraceID’s for some reason when I get the TraceID from the Gin-Gonic context using our production code it comes out as 32 digits. But when I do it, using the same packages and versions, in a standalone app, the traceID comes out as 16 digits. It doesn’t make sense. I am investigating that.

However regardless of this Tempo should be able to handle the 32 digits TraceID. You mention “internally”, have you actually successfully queried a 32 digits TraceID from the UI? There might be an issue there as the /metrics show the ingestor/distributor is getting the spans. Is there a way to dump all current traces from Tempo for analysis?

I am going to keep looking into the Gin-Gonic side. However it would be useful to have that extra debug statements in Tempo, not sure how to go about that.

Thanks.

Yup, I can query both 128 bit (32 digit) and 64 bit (16 digit) trace ids in Grafana.

Just submitted this:

After this is merged we will be able to turn on trace id logging which will allow you to directly see the trace ids that Tempo is receiving.

Looking forward to try it. Hopefully it doesn’t take too long to get it merged.

Thanks

This has been merged: Added trace id logging by joe-elliott · Pull Request #700 · grafana/tempo · GitHub

You can test it out using image:
grafana/tempo:91b41f1b

It requires setting the flag:

distributor:
  log_received_traces: true
--distributor.log-received-traces=true

Note that this logs all trace ids and spans ids so is for debugging only. Also this logs those ids before rate limiting and pushing to the ingesters. So we should keep an eye on our logs to make sure we’re not getting failures.

Hi

I think the 32 digits TraceIDs are not being received/accepted by the distributor.

Sending the request

These are Tempo logs, it doesn’t seem to receive it. I checked those traces and they belong to my service liveness proof. Somehow they are making it to Tempo and 16 digits long. You can check the time stamp.

This is the code getting span, TraceID and logging it.

I didn’t see any relevant error on the Tempo logs. No sign of Tempo receiving it and discarding it.
Still scratching my head.

Thanks

That’s really weird, the trace id is set to the span id. Notice that on all the Tempo log lines the span id and trace id are equal.

Is there anything between Tempo and your application? It looks like you’re using zipkin for tracing. I suppose there’s the possibility that the OTel translation between zipkin => otel is doing something funky.

EDIT:
We don’t use zipkin at Grafana, but I found a zipkin trace in the wild and the trace id was equal to the root span id. I wonder if this is something specific to the zipkin instrumentation library?

Another bit of weirdness. In this issue: What should a trace id and span id look like? · Issue #5 · openzipkin/b3-propagation · GitHub a comment indicates:

Traditionally, the start of a trace (root span) has the same value for trace id and span id. The root span has no parent id. Its child would share a trace id with its parent, but provision a new span id.

suggesting that the span id and trace id do match with 64 bit ids. I suppose the question is:

Why does your tracer report a full 128 bit trace id and at what point does the trace id get replaced with the root span id?

I think having TraceID=SpanID is expected according to that comment and what I’ve seen on my testing so far. Initially the root spanid matches the traceid, as soon as I create new spans the spanid changes, I have seen that. What you see in the logs is the liveness proof which has only one span, therefore the id’s are equal. I don’t think the ids get replaced further down. It might be a bit weird but not exactly a problem.

If I ignore that gin-gonic span and create a new one, everything works and I get a 16 digits span.

span := opentracing.StartSpan("myspan")

What I don’t know is why gin-gonic middleware is creating a 32 digits TraceID (which is valid any ways according to the specs) and why Tempo is not accepting it. That was my original concern.

Why the distributor is not detecting/receiving it? Is there some kind of filter before the span gets to the distributor?

Apologies for the late reply. I finally got around to setting up a little go/zipkin test of my own and found the same result you did. If I use

span := opentracing.StartSpan("myspan")

The the span id matches the trace id and everything works as expected.

What I don’t know is why gin-gonic middleware is creating a 32 digits TraceID (which is valid any ways according to the specs) and why Tempo is not accepting it. That was my original concern.

I did some quick testing by curling 2 zipkins span into Tempo. One has a 64 bit trace id and one has 128 bit trace id.

$ curl -H POST -d '[{"timestamp":1621457256581420,"duration":13,"traceId":"03dbddea2d2d9ac0","id":"03dbddea2d2d9ac0","name":"test!","localEndpoint":{"serviceName":"myservice","ipv4":"66.96.163.136","port":80}}]' http://localhost:9411
$ curl -H POST -d '[{"timestamp":1621457256581420,"duration":13,"traceId":"01234567890123456789012345678901","id":"03dbddea2d2d9ac0","name":"test!","localEndpoint":{"serviceName":"myservice","ipv4":"66.96.163.136","port":80}}]' http://localhost:9411

Tempo logged the trace/span ids:

level=info ts=2021-05-19T21:12:32.583187446Z caller=distributor.go:409 msg=received spanid=03dbddea2d2d9ac0 traceid=000000000000000003dbddea2d2d9ac0
level=info ts=2021-05-19T21:13:56.289703957Z caller=distributor.go:409 msg=received spanid=03dbddea2d2d9ac0 traceid=01234567890123456789012345678901

So it does appear that Tempo is ingesting 128bit (32 character) trace ids correctly for zipkin traces.

If you would like to take this further I think you would need to point your zipkin tracer at something like RequestBin (simple to get going with a docker-compose). Requestbin will simply record the raw json that is POSTed and we can see the exact messages that are being sent to Tempo.

I used it when I ran my local test and it will show you something like this:

You are correct Tempo accepts 32 and 16 digits long TraceIDs, how do I know? because I initially was using opentracing but due to the fact that it doesn’t seem to support cloudevents I was forced to switch to opencensus, this tracer creates 32 digits long TraceIDs. I took one of them and it was processed by Tempo. I think the problem that I observed may be caused by this code, it is doing something funky:

Anyways, I really appreciate the time you spent helping me out with this issue which is certainly
weird.

On an unrelated issue, I am trying to set up in my k8s cluster Loki and Tempo, Loki listens on port 3100 so does Tempo. What is the easies way to change Tempo’s port, I am using the distributed version and helm? I have used:

--set server.http_listen_port="3101"

but doesn’t seem to do the trick. Of course I downloaded the Chart and made the proper changes in several places but not sure if that’s the way to go.

Thanks.

PS Just for the sake of curiosity I will use the RequestBin to find out what the real problem is, but for the moment it can wait a bit.

1 Like

I hack on the helm charts some, but we don’t use them internally so I can’t say I’m an expert. The tempo helm chart seems to support this var:

tempo.server.httpListenPort	

Config docs: helm-charts/charts/tempo at main · grafana/helm-charts · GitHub

Unfortunately the tempo-distributed helm chart only seems to support overriding the config as a whole:

This topic was automatically closed 365 days after the last reply. New replies are no longer allowed.