Its been more than 2 weeks since i am stuck in this issue. I have Fluent bit to send the logs like this ( Fluent Bit → Loki and Otel Collector → Tempo). I have done all the changes but not able to get traces in tempo its giving error “failed to get trace with id: 154717fffa19312ec6fea8533b64400f Status: 404 Not Found Body: trace not found” Can someone please help here! ( I have done “temp”, “loki”, otel-collector" deployment via helm
apiVersion: v1
kind: ConfigMap
metadata:
name: fluent-bit-config
namespace: monitoring-loki
data:
fluent-bit.conf: |
[SERVICE]
Flush 1
Log_Level info
Daemon off
Parsers_File parsers.conf
[INPUT]
Name tail
Path /var/log/*.log
Parser docker
Tag kube.*
Refresh_Interval 5
Mem_Buf_Limit 5MB
Skip_Long_Lines On
[FILTER]
Name kubernetes
Match kube.*
Kube_URL https://kubernetes.default.svc:443
Kube_Tag_Prefix kube.var.log.containers.
Merge_Log On
Merge_Log_Key log_processed
K8S-Logging.Parser On
K8S-Logging.Exclude Off
[OUTPUT]
Name loki
Match kube.*
Host "loki-svc.monitoring-loki"
tenant_id ""
Port "3100"
label_keys $trace_id
auto_kubernetes_labels on
[OUTPUT]
Name opentelemetry
Match kube.*
Host "otel-collector-svc.monitoring-loki"
Port "55680"
Traces_uri /v1/traces
Logs_uri /v1/logs
mode: daemonset
presets:
# enables the k8sattributesprocessor and adds it to the traces, metrics, and logs pipelines
kubernetesAttributes:
enabled: true
# enables the kubeletstatsreceiver and adds it to the metrics pipelines
kubeletMetrics:
enabled: true
# Enables the filelogreceiver and adds it to the logs pipelines
logsCollection:
enabled: true
## The chart only includes the loggingexporter by default
## If you want to send your data somewhere you need to
## configure an exporter, such as the otlpexporter
config:
exporters:
logging:
loglevel: info
otlp:
endpoint: "tempo-monitoring-loki.svc.cluster.local:4317"
receivers:
otlp:
protocols:
grpc:
endpoint: "0.0.0.0:14250"
http:
endpoint: "0.0.0.0:4318"
service:
pipelines:
traces:
exporters:
- logging
- otlp
receivers:
- otlp
processors:
- memory_limiter
- batch
# metrics:
# exporters: [ otlp ]
# logs:
# exporters: [ otlp ]
once i do trace query in tempo i get below error.
“failed to get trace with id: 803196185ebb35907e9669404eef14dd Status: 404 Not Found Body: trace not found”
I have to deployment in EKS cluster.
Further i drilled down the log of otel pod
zapgrpc/zapgrpc.go:195 [core] [Channel #3 SubChannel #4] grpc: addrConn.createTransport failed to connect to {Addr: "tempo.monitoring-loki.svc.cluster.local:4317", ServerName: "tempo.monitoring-loki.svc.cluster.local:4317", }. Err: connection error: desc = "transport: Error while dialing: dial tcp: lookup tempo.monitoring-loki.svc.cluster.local on 172.20.0.10:53: no such host" {"grpc_log": true}
warn zapgrpc/zapgrpc.go:195 [core] [Channel #3 SubChannel #4] grpc: addrConn.createTransport failed to connect to {Addr: "tempo.monitoring-loki.svc.cluster.local:4317", ServerName: "tempo.monitoring-loki.svc.cluster.local:4317", }. Err: connection error: desc = "transport: Error while dialing: dial tcp: lookup tempo.monitoring-loki.svc.cluster.local on 172.20.0.10:53: no such host" {"grpc_log": true}
below are my services
Further these logs i am getting in otel.
2024-02-15T11:55:13.407Z info service@v0.93.0/telemetry.go:76 Setting up own telemetry...
2024-02-15T11:55:13.407Z info service@v0.93.0/telemetry.go:146 Serving metrics {"address": "10.92.114.166:8888", "level": "Basic"}
2024-02-15T11:55:13.408Z info memorylimiter/memorylimiter.go:160 Using percentage memory limiter {"kind": "processor", "name": "memory_limiter", "pipeline": "traces", "total_memory_mib": 15692, "limit_percentage": 80, "spike_limit_percentage": 25}
2024-02-15T11:55:13.408Z info memorylimiter/memorylimiter.go:77 Memory limiter configured {"kind": "processor", "name": "memory_limiter", "pipeline": "traces", "limit_mib": 12553, "spike_limit_mib": 3923, "check_interval": 5}
2024-02-15T11:55:13.408Z info exporter@v0.93.0/exporter.go:275 Development component. May change in the future. {"kind": "exporter", "data_type": "logs", "name": "debug"}
2024-02-15T11:55:13.408Z info exporter@v0.93.0/exporter.go:275 Development component. May change in the future. {"kind": "exporter", "data_type": "metrics", "name": "debug"}
2024-02-15T11:55:13.409Z info service@v0.93.0/service.go:139 Starting otelcol-contrib... {"Version": "0.93.0", "NumCPU": 4}
2024-02-15T11:55:13.409Z info extensions/extensions.go:34 Starting extensions...
2024-02-15T11:55:13.409Z info extensions/extensions.go:37 Extension is starting... {"kind": "extension", "name": "health_check"}
2024-02-15T11:55:13.409Z info healthcheckextension@v0.93.0/healthcheckextension.go:35 Starting health_check extension {"kind": "extension", "name": "health_check", "config": {"Endpoint":"10.92.114.166:13133","TLSSetting":null,"CORS":null,"Auth":null,"MaxRequestBodySize":0,"IncludeMetadata":false,"ResponseHeaders":null,"Path":"/","ResponseBody":null,"CheckCollectorPipeline":{"Enabled":false,"Interval":"5m","ExporterFailureThreshold":5}}}
2024-02-15T11:55:13.409Z info extensions/extensions.go:52 Extension started. {"kind": "extension", "name": "health_check"}
2024-02-15T11:55:13.409Z warn internal@v0.93.0/warning.go:40 Using the 0.0.0.0 address exposes this server to every network interface, which may facilitate Denial of Service attacks {"kind": "receiver", "name": "otlp", "data_type": "traces", "documentation": "https://github.com/open-telemetry/opentelemetry-collector/blob/main/docs/security-best-practices.md#safeguards-against-denial-of-service-attacks"}
2024-02-15T11:55:13.409Z info otlpreceiver@v0.93.0/otlp.go:102 Starting GRPC server {"kind": "receiver", "name": "otlp", "data_type": "traces", "endpoint": "0.0.0.0:4317"}
2024-02-15T11:55:13.410Z warn internal@v0.93.0/warning.go:40 Using the 0.0.0.0 address exposes this server to every network interface, which may facilitate Denial of Service attacks {"kind": "receiver", "name": "otlp", "data_type": "traces", "documentation": "https://github.com/open-telemetry/opentelemetry-collector/blob/main/docs/security-best-practices.md#safeguards-against-denial-of-service-attacks"}
2024-02-15T11:55:13.410Z info otlpreceiver@v0.93.0/otlp.go:152 Starting HTTP server {"kind": "receiver", "name": "otlp", "data_type": "traces", "endpoint": "0.0.0.0:4318"}
2024-02-15T11:55:13.410Z info k8sobjectsreceiver@v0.93.0/receiver.go:73 Object Receiver started {"kind": "receiver", "name": "k8sobjects", "data_type": "logs"}
2024-02-15T11:55:13.410Z info k8sobjectsreceiver@v0.93.0/receiver.go:93 Started collecting {"kind": "receiver", "name": "k8sobjects", "data_type": "logs", "gvr": "events.k8s.io/v1, Resource=events", "mode": "watch", "namespaces": []}
2024-02-15T11:55:13.410Z info prometheusreceiver@v0.93.0/metrics_receiver.go:240 Starting discovery manager {"kind": "receiver", "name": "prometheus", "data_type": "metrics"}
2024-02-15T11:55:13.411Z info prometheusreceiver@v0.93.0/metrics_receiver.go:231 Scrape job added {"kind": "receiver", "name": "prometheus", "data_type": "metrics", "jobName": "opentelemetry-collector"}
2024-02-15T11:55:13.411Z info prometheusreceiver@v0.93.0/metrics_receiver.go:282 Starting scrape manager {"kind": "receiver", "name": "prometheus", "data_type": "metrics"}
2024-02-15T11:55:13.440Z info k8sclusterreceiver@v0.93.0/receiver.go:53 Starting shared informers and wait for initial cache sync. {"kind": "receiver", "name": "k8s_cluster", "data_type": "metrics"}
2024-02-15T11:55:13.443Z info healthcheck/handler.go:132 Health Check state change {"kind": "extension", "name": "health_check", "status": "ready"}
2024-02-15T11:55:13.443Z info service@v0.93.0/service.go:165 Everything is ready. Begin running and processing data.
2024-02-15T11:55:13.449Z warn zapgrpc/zapgrpc.go:195 [core] [Channel #3 SubChannel #4] grpc: addrConn.createTransport failed to connect to {Addr: "tempo.monitoring-loki.svc.cluster.local:4317", ServerName: "tempo.monitoring-loki.svc.cluster.local:4317", }. Err: connection error: desc = "transport: Error while dialing: dial tcp: lookup tempo.monitoring-loki.svc.cluster.local on 172.20.0.10:53: no such host" {"grpc_log": true}
2024-02-15T11:55:13.649Z info k8sclusterreceiver@v0.93.0/receiver.go:74 Completed syncing shared informer caches. {"kind": "receiver", "name": "k8s_cluster", "data_type": "metrics"}
2024-02-15T11:55:14.454Z warn zapgrpc/zapgrpc.go:195 [core] [Channel #3 SubChannel #4] grpc: addrConn.createTransport failed to connect to {Addr: "tempo.monitoring-loki.svc.cluster.local:4317", ServerName: "tempo.monitoring-loki.svc.cluster.local:4317", }. Err: connection error: desc = "transport: Error while dialing: dial tcp: lookup tempo.monitoring-loki.svc.cluster.local on 172.20.0.10:53: no such host" {"grpc_log": true}
2024-02-15T11:55:16.356Z warn zapgrpc/zapgrpc.go:195 [core] [Channel #3 SubChannel #4] grpc: addrConn.createTransport failed to connect to {Addr: "tempo.monitoring-loki.svc.cluster.local:4317", ServerName: "tempo.monitoring-loki.svc.cluster.local:4317", }. Err: connection error: desc = "transport: Error while dialing: dial tcp: lookup tempo.monitoring-loki.svc.cluster.local on 172.20.0.10:53: no such host" {"grpc_log": true}
2024-02-15T11:55:19.149Z warn zapgrpc/zapgrpc.go:195 [core] [Channel #3 SubChannel #4] grpc: addrConn.createTransport failed to connect to {Addr: "tempo.monitoring-loki.svc.cluster.local:4317", ServerName: "tempo.monitoring-loki.svc.cluster.local:4317", }. Err: connection error: desc = "transport: Error while dialing: dial tcp: lookup tempo.monitoring-loki.svc.cluster.local on 172.20.0.10:53: no such host" {"grpc_log": true}
2024-02-15T11:55:21.993Z info LogsExporter {"kind": "exporter", "data_type": "logs", "name": "debug", "resource logs": 1, "log records": 1}
2024-02-15T11:55:23.786Z warn zapgrpc/zapgrpc.go:195 [core] [Channel #3 SubChannel #4] grpc: addrConn.createTransport failed to connect to {Addr: "tempo.monitoring-loki.svc.cluster.local:4317", ServerName: "tempo.monitoring-loki.svc.cluster.local:4317", }. Err: connection error: desc = "transport: Error while dialing: dial tcp: lookup tempo.monitoring-loki.svc.cluster.local on 172.20.0.10:53: no such host" {"grpc_log": true}
2024-02-15T11:55:23.798Z info MetricsExporter {"kind": "exporter", "data_type": "metrics", "name": "debug", "resource metrics": 686, "metrics": 1610, "data points": 1610}
2024-02-15T11:55:26.202Z info LogsExporter {"kind": "exporter", "data_type": "logs", "name": "debug", "resource logs": 1, "log records": 1}
2024-02-15T11:55:27.203Z info LogsExporter {"kind": "exporter", "data_type": "logs", "name": "debug", "resource logs": 2, "log records": 2}
2024-02-15T11:55:29.007Z info LogsExporter {"kind": "exporter", "data_type": "logs", "name": "debug", "resource logs": 2, "log records": 2}
2024-02-15T11:55:29.930Z warn zapgrpc/zapgrpc.go:195 [core] [Channel #3 SubChannel #4] grpc: addrConn.createTransport failed to connect to {Addr: "tempo.monitoring-loki.svc.cluster.local:4317", ServerName: "tempo.monitoring-loki.svc.cluster.local:4317", }. Err: connection error: desc = "transport: Error while dialing: dial tcp: lookup tempo.monitoring-loki.svc.cluster.local on 172.20.0.10:53: no such host" {"grpc_log": true}
2024-02-15T11:55:33.415Z info LogsExporter {"kind": "exporter", "data_type": "logs", "name": "debug", "resource logs": 1, "log records": 1}
2024-02-15T11:55:33.816Z info MetricsExporter {"kind": "exporter", "data_type": "metrics", "name": "debug", "resource metrics": 686, "metrics": 1612, "data points": 1614}
2024-02-15T11:55:39.486Z warn zapgrpc/zapgrpc.go:195 [core] [Channel #3 SubChannel #4] grpc: addrConn.createTransport failed to connect to {Addr: "tempo.monitoring-loki.svc.cluster.local:4317", ServerName: "tempo.monitoring-loki.svc.cluster.local:4317", }. Err: connection error: desc = "transport: Error while dialing: dial tcp: lookup tempo.monitoring-loki.svc.cluster.local on 172.20.0.10:53: no such host" {"grpc_log": true}
2024-02-15T11:55:41.830Z info LogsExporter {"kind": "exporter", "data_type": "logs", "name": "debug", "resource logs": 2, "log records": 2}
2024-02-15T11:55:42.652Z info LogsExporter {"kind": "exporter", "data_type": "logs", "name": "debug", "resource logs": 3, "log records": 3
Below are my services in monitoring-loki namespace for otel and temp
this error makes me think that your otel collector is not able to reach to tempo.
could be a networking related issues… check if you can reach to tempo from your otel collector.
i have ingress in order to access the Grafana, thats all. i am not specifying any specific ingress in order to access temp. I am adding data source in Grafana in order to access Loki