Not able to get traces in tempo from Grafana loki and OTEL collector

Its been more than 2 weeks since i am stuck in this issue. I have Fluent bit to send the logs like this ( Fluent Bit → Loki and Otel Collector → Tempo). I have done all the changes but not able to get traces in tempo its giving error “failed to get trace with id: 154717fffa19312ec6fea8533b64400f Status: 404 Not Found Body: trace not found” Can someone please help here! ( I have done “temp”, “loki”, otel-collector" deployment via helm

apiVersion: v1
kind: ConfigMap
metadata:
  name: fluent-bit-config
  namespace: monitoring-loki
data:
  fluent-bit.conf: |

   [SERVICE]
     Flush 1
     Log_Level info
     Daemon off 
     Parsers_File parsers.conf

   [INPUT]
     Name tail
     Path /var/log/*.log
     Parser docker
     Tag kube.* 
     Refresh_Interval 5
     Mem_Buf_Limit 5MB
     Skip_Long_Lines On

   [FILTER]
     Name kubernetes
     Match kube.*
     Kube_URL https://kubernetes.default.svc:443
     Kube_Tag_Prefix kube.var.log.containers.
     Merge_Log On
     Merge_Log_Key log_processed
     K8S-Logging.Parser On
     K8S-Logging.Exclude Off

   [OUTPUT] 
     Name loki
     Match kube.*
     Host "loki-svc.monitoring-loki"
     tenant_id ""
     Port "3100"
     label_keys $trace_id
     auto_kubernetes_labels on
   [OUTPUT]
     Name opentelemetry
     Match kube.*
     Host "otel-collector-svc.monitoring-loki"
     Port "55680" 
     Traces_uri /v1/traces
     Logs_uri /v1/logs
config:
  exporters:
    logging:
      loglevel: info
    otlp:
      endpoint: "tempo-monitoring-loki.svc.cluster.local:4317"
  receivers:
    otlp:
      protocols:
        grpc: 
          endpoint: "0.0.0.0:14250"
        http:
          endpoint: "0.0.0.0:4318"
  service:
    pipelines:
      traces:
        exporters: 
          - logging
          - otlp
        receivers:
          - otlp
        processors:
          - memory_limiter
          - batch    
      # metrics:
      #   exporters: [ otlp ]
      # logs:
      #   exporters: [ otlp ]
mode: daemonset

presets:
  # enables the k8sattributesprocessor and adds it to the traces, metrics, and logs pipelines
  kubernetesAttributes:
    enabled: true
  # enables the kubeletstatsreceiver and adds it to the metrics pipelines
  kubeletMetrics:
    enabled: true
  # Enables the filelogreceiver and adds it to the logs pipelines
  logsCollection:
    enabled: true
## The chart only includes the loggingexporter by default
## If you want to send your data somewhere you need to
## configure an exporter, such as the otlpexporter
config:
  exporters:
    logging:
      loglevel: info
    otlp:
      endpoint: "tempo-monitoring-loki.svc.cluster.local:4317"
  receivers:
    otlp:
      protocols:
        grpc: 
          endpoint: "0.0.0.0:14250"
        http:
          endpoint: "0.0.0.0:4318"
  service:
    pipelines:
      traces:
        exporters: 
          - logging
          - otlp
        receivers:
          - otlp
        processors:
          - memory_limiter
          - batch    
      # metrics:
      #   exporters: [ otlp ]
      # logs:
      #   exporters: [ otlp ]
storage:
  trace:
    backend: local 
    local:
      volume:
        persistentVolumeClaim:
          claimName: storage-tempo-0

minio:
  enabled: false

distributor:
  config:
    receivers:
      otlp:
        protocols:
          grpc:
            endpoint: "0.0.0.0:14250"
          http:
            endpoint: "0.0.0.0:4318"
traces:
  otlp:
    grpc:
      enabled: true
    http: 
      enabled: true
  zipkin:
    enabled: false
  jaeger: 
    thriftHttp:
      enabled: false
  opencensus:
    enabled: false

I am gettting error in Tempo that trace body not found

what do you see when you run { } TraceQL query in tempo?

there might be some misconfiguration in the pipeline or if you have sampling enabled, your trace might be dropped by sampling.

I recommend using something like GitHub - equinix-labs/otel-cli: OpenTelemetry command-line tool for sending events from shell scripts & similar environments to push a trace to test the pipeline :slight_smile:

checkout examples for tempo configs, and intro to mltp example for an end to end to setup.

1 Like

once i do trace query in tempo i get below error.
“failed to get trace with id: 803196185ebb35907e9669404eef14dd Status: 404 Not Found Body: trace not found”


I have to deployment in EKS cluster.
Further i drilled down the log of otel pod

zapgrpc/zapgrpc.go:195  [core] [Channel #3 SubChannel #4] grpc: addrConn.createTransport failed to connect to {Addr: "tempo.monitoring-loki.svc.cluster.local:4317", ServerName: "tempo.monitoring-loki.svc.cluster.local:4317", }. Err: connection error: desc = "transport: Error while dialing: dial tcp: lookup tempo.monitoring-loki.svc.cluster.local on 172.20.0.10:53: no such host" {"grpc_log": true}
warn    zapgrpc/zapgrpc.go:195  [core] [Channel #3 SubChannel #4] grpc: addrConn.createTransport failed to connect to {Addr: "tempo.monitoring-loki.svc.cluster.local:4317", ServerName: "tempo.monitoring-loki.svc.cluster.local:4317", }. Err: connection error: desc = "transport: Error while dialing: dial tcp: lookup tempo.monitoring-loki.svc.cluster.local on 172.20.0.10:53: no such host" {"grpc_log": true}
below are my services

Further these logs i am getting in otel.

2024-02-15T11:55:13.407Z        info    service@v0.93.0/telemetry.go:76 Setting up own telemetry...
2024-02-15T11:55:13.407Z        info    service@v0.93.0/telemetry.go:146        Serving metrics {"address": "10.92.114.166:8888", "level": "Basic"}
2024-02-15T11:55:13.408Z        info    memorylimiter/memorylimiter.go:160      Using percentage memory limiter {"kind": "processor", "name": "memory_limiter", "pipeline": "traces", "total_memory_mib": 15692, "limit_percentage": 80, "spike_limit_percentage": 25}
2024-02-15T11:55:13.408Z        info    memorylimiter/memorylimiter.go:77       Memory limiter configured       {"kind": "processor", "name": "memory_limiter", "pipeline": "traces", "limit_mib": 12553, "spike_limit_mib": 3923, "check_interval": 5}
2024-02-15T11:55:13.408Z        info    exporter@v0.93.0/exporter.go:275        Development component. May change in the future.      {"kind": "exporter", "data_type": "logs", "name": "debug"}
2024-02-15T11:55:13.408Z        info    exporter@v0.93.0/exporter.go:275        Development component. May change in the future.      {"kind": "exporter", "data_type": "metrics", "name": "debug"}
2024-02-15T11:55:13.409Z        info    service@v0.93.0/service.go:139  Starting otelcol-contrib...     {"Version": "0.93.0", "NumCPU": 4}
2024-02-15T11:55:13.409Z        info    extensions/extensions.go:34     Starting extensions...
2024-02-15T11:55:13.409Z        info    extensions/extensions.go:37     Extension is starting...        {"kind": "extension", "name": "health_check"}
2024-02-15T11:55:13.409Z        info    healthcheckextension@v0.93.0/healthcheckextension.go:35 Starting health_check extension {"kind": "extension", "name": "health_check", "config": {"Endpoint":"10.92.114.166:13133","TLSSetting":null,"CORS":null,"Auth":null,"MaxRequestBodySize":0,"IncludeMetadata":false,"ResponseHeaders":null,"Path":"/","ResponseBody":null,"CheckCollectorPipeline":{"Enabled":false,"Interval":"5m","ExporterFailureThreshold":5}}}
2024-02-15T11:55:13.409Z        info    extensions/extensions.go:52     Extension started.      {"kind": "extension", "name": "health_check"}
2024-02-15T11:55:13.409Z        warn    internal@v0.93.0/warning.go:40  Using the 0.0.0.0 address exposes this server to every network interface, which may facilitate Denial of Service attacks       {"kind": "receiver", "name": "otlp", "data_type": "traces", "documentation": "https://github.com/open-telemetry/opentelemetry-collector/blob/main/docs/security-best-practices.md#safeguards-against-denial-of-service-attacks"}
2024-02-15T11:55:13.409Z        info    otlpreceiver@v0.93.0/otlp.go:102        Starting GRPC server    {"kind": "receiver", "name": "otlp", "data_type": "traces", "endpoint": "0.0.0.0:4317"}
2024-02-15T11:55:13.410Z        warn    internal@v0.93.0/warning.go:40  Using the 0.0.0.0 address exposes this server to every network interface, which may facilitate Denial of Service attacks       {"kind": "receiver", "name": "otlp", "data_type": "traces", "documentation": "https://github.com/open-telemetry/opentelemetry-collector/blob/main/docs/security-best-practices.md#safeguards-against-denial-of-service-attacks"}
2024-02-15T11:55:13.410Z        info    otlpreceiver@v0.93.0/otlp.go:152        Starting HTTP server    {"kind": "receiver", "name": "otlp", "data_type": "traces", "endpoint": "0.0.0.0:4318"}
2024-02-15T11:55:13.410Z        info    k8sobjectsreceiver@v0.93.0/receiver.go:73       Object Receiver started {"kind": "receiver", "name": "k8sobjects", "data_type": "logs"}
2024-02-15T11:55:13.410Z        info    k8sobjectsreceiver@v0.93.0/receiver.go:93       Started collecting      {"kind": "receiver", "name": "k8sobjects", "data_type": "logs", "gvr": "events.k8s.io/v1, Resource=events", "mode": "watch", "namespaces": []}
2024-02-15T11:55:13.410Z        info    prometheusreceiver@v0.93.0/metrics_receiver.go:240      Starting discovery manager      {"kind": "receiver", "name": "prometheus", "data_type": "metrics"}
2024-02-15T11:55:13.411Z        info    prometheusreceiver@v0.93.0/metrics_receiver.go:231      Scrape job added        {"kind": "receiver", "name": "prometheus", "data_type": "metrics", "jobName": "opentelemetry-collector"}
2024-02-15T11:55:13.411Z        info    prometheusreceiver@v0.93.0/metrics_receiver.go:282      Starting scrape manager {"kind": "receiver", "name": "prometheus", "data_type": "metrics"}
2024-02-15T11:55:13.440Z        info    k8sclusterreceiver@v0.93.0/receiver.go:53       Starting shared informers and wait for initial cache sync.     {"kind": "receiver", "name": "k8s_cluster", "data_type": "metrics"}
2024-02-15T11:55:13.443Z        info    healthcheck/handler.go:132      Health Check state change       {"kind": "extension", "name": "health_check", "status": "ready"}
2024-02-15T11:55:13.443Z        info    service@v0.93.0/service.go:165  Everything is ready. Begin running and processing data.
2024-02-15T11:55:13.449Z        warn    zapgrpc/zapgrpc.go:195  [core] [Channel #3 SubChannel #4] grpc: addrConn.createTransport failed to connect to {Addr: "tempo.monitoring-loki.svc.cluster.local:4317", ServerName: "tempo.monitoring-loki.svc.cluster.local:4317", }. Err: connection error: desc = "transport: Error while dialing: dial tcp: lookup tempo.monitoring-loki.svc.cluster.local on 172.20.0.10:53: no such host" {"grpc_log": true}
2024-02-15T11:55:13.649Z        info    k8sclusterreceiver@v0.93.0/receiver.go:74       Completed syncing shared informer caches.     {"kind": "receiver", "name": "k8s_cluster", "data_type": "metrics"}
2024-02-15T11:55:14.454Z        warn    zapgrpc/zapgrpc.go:195  [core] [Channel #3 SubChannel #4] grpc: addrConn.createTransport failed to connect to {Addr: "tempo.monitoring-loki.svc.cluster.local:4317", ServerName: "tempo.monitoring-loki.svc.cluster.local:4317", }. Err: connection error: desc = "transport: Error while dialing: dial tcp: lookup tempo.monitoring-loki.svc.cluster.local on 172.20.0.10:53: no such host" {"grpc_log": true}
2024-02-15T11:55:16.356Z        warn    zapgrpc/zapgrpc.go:195  [core] [Channel #3 SubChannel #4] grpc: addrConn.createTransport failed to connect to {Addr: "tempo.monitoring-loki.svc.cluster.local:4317", ServerName: "tempo.monitoring-loki.svc.cluster.local:4317", }. Err: connection error: desc = "transport: Error while dialing: dial tcp: lookup tempo.monitoring-loki.svc.cluster.local on 172.20.0.10:53: no such host" {"grpc_log": true}
2024-02-15T11:55:19.149Z        warn    zapgrpc/zapgrpc.go:195  [core] [Channel #3 SubChannel #4] grpc: addrConn.createTransport failed to connect to {Addr: "tempo.monitoring-loki.svc.cluster.local:4317", ServerName: "tempo.monitoring-loki.svc.cluster.local:4317", }. Err: connection error: desc = "transport: Error while dialing: dial tcp: lookup tempo.monitoring-loki.svc.cluster.local on 172.20.0.10:53: no such host" {"grpc_log": true}
2024-02-15T11:55:21.993Z        info    LogsExporter    {"kind": "exporter", "data_type": "logs", "name": "debug", "resource logs": 1, "log records": 1}
2024-02-15T11:55:23.786Z        warn    zapgrpc/zapgrpc.go:195  [core] [Channel #3 SubChannel #4] grpc: addrConn.createTransport failed to connect to {Addr: "tempo.monitoring-loki.svc.cluster.local:4317", ServerName: "tempo.monitoring-loki.svc.cluster.local:4317", }. Err: connection error: desc = "transport: Error while dialing: dial tcp: lookup tempo.monitoring-loki.svc.cluster.local on 172.20.0.10:53: no such host" {"grpc_log": true}
2024-02-15T11:55:23.798Z        info    MetricsExporter {"kind": "exporter", "data_type": "metrics", "name": "debug", "resource metrics": 686, "metrics": 1610, "data points": 1610}
2024-02-15T11:55:26.202Z        info    LogsExporter    {"kind": "exporter", "data_type": "logs", "name": "debug", "resource logs": 1, "log records": 1}
2024-02-15T11:55:27.203Z        info    LogsExporter    {"kind": "exporter", "data_type": "logs", "name": "debug", "resource logs": 2, "log records": 2}
2024-02-15T11:55:29.007Z        info    LogsExporter    {"kind": "exporter", "data_type": "logs", "name": "debug", "resource logs": 2, "log records": 2}
2024-02-15T11:55:29.930Z        warn    zapgrpc/zapgrpc.go:195  [core] [Channel #3 SubChannel #4] grpc: addrConn.createTransport failed to connect to {Addr: "tempo.monitoring-loki.svc.cluster.local:4317", ServerName: "tempo.monitoring-loki.svc.cluster.local:4317", }. Err: connection error: desc = "transport: Error while dialing: dial tcp: lookup tempo.monitoring-loki.svc.cluster.local on 172.20.0.10:53: no such host" {"grpc_log": true}
2024-02-15T11:55:33.415Z        info    LogsExporter    {"kind": "exporter", "data_type": "logs", "name": "debug", "resource logs": 1, "log records": 1}
2024-02-15T11:55:33.816Z        info    MetricsExporter {"kind": "exporter", "data_type": "metrics", "name": "debug", "resource metrics": 686, "metrics": 1612, "data points": 1614}
2024-02-15T11:55:39.486Z        warn    zapgrpc/zapgrpc.go:195  [core] [Channel #3 SubChannel #4] grpc: addrConn.createTransport failed to connect to {Addr: "tempo.monitoring-loki.svc.cluster.local:4317", ServerName: "tempo.monitoring-loki.svc.cluster.local:4317", }. Err: connection error: desc = "transport: Error while dialing: dial tcp: lookup tempo.monitoring-loki.svc.cluster.local on 172.20.0.10:53: no such host" {"grpc_log": true}
2024-02-15T11:55:41.830Z        info    LogsExporter    {"kind": "exporter", "data_type": "logs", "name": "debug", "resource logs": 2, "log records": 2}
2024-02-15T11:55:42.652Z        info    LogsExporter    {"kind": "exporter", "data_type": "logs", "name": "debug", "resource logs": 3, "log records": 3

Below are my services in monitoring-loki namespace for otel and temp


otel-collector                                   ClusterIP   172.20.74.234    <none>        55680/TCP,14250/TCP,14268/TCP,9411/TCP,8888/TCP,8889/TCP                                                  9d
otel-collector-cluster-opentelemetry-collector   ClusterIP   172.20.146.18    <none>        6831/UDP,14250/TCP,14268/TCP,4317/TCP,4318/TCP,9411/TCP                                                   20m
tempo                                            ClusterIP   172.20.236.87    <none>        3100/TCP,6831/UDP,6832/UDP,14268/TCP,14250/TCP,9411/TCP,55680/TCP,55681/TCP,4317/TCP,4318/TCP,55678/TCP   20m

Can you please help . i stuck in this issue from past 5 weeks. But not getting any clue to fix it

In the OTLP logs i am geting this error

zapgrpc/zapgrpc.go:195  [core] [Channel #2 SubChannel #4] grpc: addrConn.createTransport failed to connect to {Addr: "tempo.monitoring-loki.svc.cluster.local:4318", ServerName: "tempo.monitoring-loki.svc.cluster.local:4318", }. Err: connection error: desc = "error reading server preface: http2: frame too large"      {"grpc_log": true}

hi :wave:, unfortunately I can’t help with otel and EKS related things. I don’t work on these so don’t have experience on it.

I am happy to help with tempo related issues :slight_smile:

this error makes me think that your otel collector is not able to reach to tempo.
could be a networking related issues… check if you can reach to tempo from your otel collector.

I have fixed the connectivity issue . I modiefied the FQDN , now i am getting frame size issue

This is basically tempo related issue as tempo is discarding the message due to larger frame size

Err: connection error: desc = "error reading server preface: http2: frame too large"  

above is the error i am getting . connection is discarded by tempo . In tempo docs it says to increase the frame size.
i did as below.

---
storage:
  trace:
    backend: local 
    local:
      volume:
        persistentVolumeClaim:
          claimName: storage-tempo-0

spec:
  server:
    logLevel: debug

server:
  grpc_server_max_recv_msg_size: 16384
  grpc_server_max_send_msg_size: 16384

distributor:
  config:
    receivers:
      otlp:
        protocols:
          grpc:
            endpoint: 0.0.0.0:4317
            max_recv_msg_size_mib: 16384
          http:
            endpoint: 0.0.0.0:4318

querier:
    frontend_worker:
        grpc_client_config:
            max_send_msg_size: 16384

traces:
  otlp:
    grpc:
      enabled: false
    http: 
      enabled: true
  zipkin:
    enabled: false
  jaeger: 
    thriftHttp:
      enabled: false
  opencensus:
    enabled: false

can you please help here

are you running any proxy in front of tempo? this can happen when you are trying to send http2 traffic to http1.1 server.

this thread desc = "error reading server preface: http2: frame too large" · Issue #646 · open-telemetry/opentelemetry-helm-charts · GitHub seems related :slight_smile:

i have ingress in order to access the Grafana, thats all. i am not specifying any specific ingress in order to access temp. I am adding data source in Grafana in order to access Loki