Appreciate some guidance on figuring out trace to logs and trace not found issues on our deployment.
I have deployed grafana loki, tempo, mimir, alloy and beyla and all works beautifully (much thanks to the Grafana folks) except two issues I cant seem to fix.
Have tried many suggested solutions and reviewed many posts but so far no joy.
Below is my full Grafana configuration
datasources:
enabled: true
defaultDatasourceEnabled: true
isDefaultDatasource: true
createPrometheusReplicasDatasources: false
label: grafana_datasource
labelValue: "1"
defaultDatasourceScrapeInterval: 15s
name: Prometheus
uid: prometheus
url: http://prometheus-operated:9090
timeout: 90
httpMethod: POST
createPrometheusReplicasDatasources: false
exemplarTraceIdDestinations:
datasourceUid: tempo
traceIdLabelName: traceID
label: grafana_datasource
labelValue: "1"
alertmanager:
enabled: false
additionalDataSources:
- name: mimir
uid: mimir
type: prometheus
url: http://grafana-mimir-nginx:80/prometheus
access: proxy
isDefault: false
version: 1
editable: true
orgId: 1
jsonData:
tlsSkipVerify: true
timeout: 180
httpMethod: POST
manageAlerts: false
prometheusType: Mimir
prometheusVersion: 2.9.1
cacheLevel: 'High'
disableRecordingRules: true
incrementalQueryOverlapWindow: 10m
exemplarTraceIdDestinations:
- datasourceUid: tempo
name: traceID
- name: tempo
type: tempo
uid: tempo
url: http://grafana-tempo-gateway:80
access: proxy
basicAuth: false
isDefault: false
version: 1
editable: true
jsonData:
tracesToLogsV2:
datasourceUid: 'loki'
spanStartTimeShift: '-1h'
spanEndTimeShift: '1h'
tags: ['job_name', 'job', 'cluster', 'instance', 'pod', 'namespace', 'service_name']
filterByTraceID: false
filterBySpanID: false
customQuery: false
query: 'method="$${__span.tags.method}"'
tracesToMetrics:
datasourceUid: mimir
spanStartTimeShift: '1h'
spanEndTimeShift: '-1h'
tags: [{ key: ‘service.name’, value: ‘service_name’ }, { key: 'job', 'job_name', 'cluster', 'instance', 'pod', 'namespace', 'service_name' }]
queries:
- name: 'Sample query'
query: 'sum(rate(traces_spanmetrics_latency_bucket{$$__tags}[5m]))'
nodeGraph:
enabled: true
search:
hide: false
traceQuery:
timeShiftEnabled: true
spanStartTimeShift: '1h'
spanEndTimeShift: '-1h'
spanBar:
type: 'Tag'
tag: 'http.path'
- name: loki
type: loki
uid: loki
access: proxy
url: http://grafana-loki-query-frontend:3100 # validated
editable: true
jsonData:
timeout: 90
maxLines: 1000
derivedFields:
- datasourceUid: tempo
matcherRegex: "(?:[tT]race[_]?[iI][dD])=(\\w+)" # validated
name: traceID
url: '$${__value.raw}'
urlDisplayLabel: 'View Trace with Internal Link'
- matcherRegex: "(?:[tT]race[_]?[iI][dD])=(\\w+)" # validated
name: TraceIDext
url: 'https://{domain}/trace/$${__value.raw}'
urlDisplayLabel: 'View Trace with External Link'
Issues:
- Can’t seem to make link to logs work, no matter what I try. Understand labels are used to make the connection between tempo and loki. The defined labels are all listed in logs (screenshot below). I have attached screenshots showing log labels. Appreciate guidance on what I need to correct to fix this.
- How does one troubleshoot and fix the error
failed to get trace with id: 2a6a3fd202978dd8 Status: 404 Not Found Body: trace not found
When the view trace button is clicked, a new tempo window opens and shows the error. What’s the best way to figure out the issues here. I have check permissions et all and all seem okay, have tried a number of suggested workarounds as well, I cannot find the smoking gun.
Applicable screen shots describing the issues are attached below:
Thank you for your time and all guidance to close the loop on this.
Dot