Assistance with Tempo TraceToLogs and Tempo’s trace not found error

Appreciate some guidance on figuring out trace to logs and trace not found issues on our deployment.
I have deployed grafana loki, tempo, mimir, alloy and beyla and all works beautifully (much thanks to the Grafana folks) except two issues I cant seem to fix.

Have tried many suggested solutions and reviewed many posts but so far no joy.

Below is my full Grafana configuration


    datasources:
      enabled: true
      defaultDatasourceEnabled: true
      isDefaultDatasource: true
      createPrometheusReplicasDatasources: false
      label: grafana_datasource
      labelValue: "1"
      defaultDatasourceScrapeInterval: 15s
      name: Prometheus
      uid: prometheus
      url: http://prometheus-operated:9090
      timeout: 90
      httpMethod: POST
      createPrometheusReplicasDatasources: false
      exemplarTraceIdDestinations:
        datasourceUid: tempo
        traceIdLabelName: traceID
        label: grafana_datasource
        labelValue: "1"
      alertmanager:
        enabled: false

  additionalDataSources:
    - name: mimir
      uid: mimir
      type: prometheus
      url: http://grafana-mimir-nginx:80/prometheus 
      access: proxy
      isDefault: false
      version: 1
      editable: true
      orgId: 1
      jsonData:
        tlsSkipVerify: true
        timeout: 180
        httpMethod: POST
        manageAlerts: false
        prometheusType: Mimir
        prometheusVersion: 2.9.1
        cacheLevel: 'High'
        disableRecordingRules: true
        incrementalQueryOverlapWindow: 10m
        exemplarTraceIdDestinations:
        - datasourceUid: tempo
          name: traceID
    - name: tempo
      type: tempo
      uid: tempo
      url: http://grafana-tempo-gateway:80   
      access: proxy
      basicAuth: false
      isDefault: false
      version: 1
      editable: true
      jsonData:
        tracesToLogsV2:
          datasourceUid: 'loki'
          spanStartTimeShift: '-1h'
          spanEndTimeShift: '1h'
          tags: ['job_name', 'job', 'cluster', 'instance', 'pod', 'namespace', 'service_name']
          filterByTraceID: false  
          filterBySpanID: false  
          customQuery: false
          query: 'method="$${__span.tags.method}"' 
        tracesToMetrics:
          datasourceUid: mimir
          spanStartTimeShift: '1h'
          spanEndTimeShift: '-1h'
          tags: [{ key: ‘service.name’, value: ‘service_name’ }, { key: 'job', 'job_name', 'cluster', 'instance', 'pod', 'namespace', 'service_name' }]
          queries:
            - name: 'Sample query'
              query: 'sum(rate(traces_spanmetrics_latency_bucket{$$__tags}[5m]))'
        nodeGraph:
          enabled: true
        search:
          hide: false
        traceQuery:
          timeShiftEnabled: true
          spanStartTimeShift: '1h'
          spanEndTimeShift: '-1h'
        spanBar:
          type: 'Tag'
          tag: 'http.path'
    - name: loki
      type: loki
      uid: loki
      access: proxy
      url: http://grafana-loki-query-frontend:3100   # validated
      editable: true
      jsonData:
        timeout: 90
        maxLines: 1000
        derivedFields:
        - datasourceUid: tempo
          matcherRegex: "(?:[tT]race[_]?[iI][dD])=(\\w+)"      # validated
          name: traceID
          url: '$${__value.raw}'
          urlDisplayLabel: 'View Trace with Internal Link'
        - matcherRegex: "(?:[tT]race[_]?[iI][dD])=(\\w+)"      # validated
          name: TraceIDext
          url: 'https://{domain}/trace/$${__value.raw}'
          urlDisplayLabel: 'View Trace with External Link'

Issues:

  1. Can’t seem to make link to logs work, no matter what I try. Understand labels are used to make the connection between tempo and loki. The defined labels are all listed in logs (screenshot below). I have attached screenshots showing log labels. Appreciate guidance on what I need to correct to fix this.
  1. How does one troubleshoot and fix the error
failed to get trace with id: 2a6a3fd202978dd8 Status: 404 Not Found Body: trace not found

When the view trace button is clicked, a new tempo window opens and shows the error. What’s the best way to figure out the issues here. I have check permissions et all and all seem okay, have tried a number of suggested workarounds as well, I cannot find the smoking gun.

Applicable screen shots describing the issues are attached below:

Thank you for your time and all guidance to close the loop on this.

Dot

I seem to have found the solution to my question. Iterating through multiple configuration options landed on the following tempo setup that works, at least so far:


    #https://grafana.com/docs/grafana/latest/datasources/tempo/configure-tempo-data-source/
    - name: tempo
      type: tempo
      uid: tempo
      url: http://grafana-tempo-gateway:80    
      access: proxy
      orgId: 1
      basicAuth: false
      isDefault: false
      version: 1
      editable: true
      jsonData:
        #https://grafana.com/docs/grafana/next/datasources/tempo/configure-tempo-data-source/#trace-to-logs
        tracesToLogsV2:
          datasourceUid: 'loki'
          spanStartTimeShift: '-1h'  
          spanEndTimeShift: '1h'
          tags: [{ key: 'service.name', value: 'service_name' }, { key: job }]
          filterByTraceID: true
          filterBySpanID: true
          customQuery: false   
          query: 'method="$${__span.tags.method}"'
          # https://grafana.com/docs/grafana/latest/datasources/tempo/configure-tempo-data-source/#custom-query-variables
        tracesToMetrics:
          datasourceUid: prometheus
          #datasourceUid: mimir
          spanStartTimeShift: '-1h'
          spanEndTimeShift: '1h'
          tags: [{ key: 'service_name', value: 'job_name' }, { key: job }]
          queries:
            - name: 'Sample query'
              #query: 'sum(rate(traces_spanmetrics_latency_bucket{$$__tags}[5m]))'
              query: 'sum(rate(spans_total{status_code="error"}[5m])) by (service_name)'
        serviceMap:
          datasourceUid: 'prometheus'  
        lokiSearch:
          datasourceUid: loki
        streamingEnabled:
          search: false
        nodeGraph:
          enabled: true
        search:
          hide: false
        traceQuery:
          timeShiftEnabled: true
          spanStartTimeShift: '-1h'
          spanEndTimeShift: '1h'

The following screenshot shows that traceID linked to logs. I’m gonna continue to test this but wanted to post this to help someone in the same situation.


If anyone knows how to troubleshoot the error:

failed to get trace with id: 1becbb026f220c24 Status: 404 Not Found Body: trace not found

Will appreciate all guidance.

Thanks,