How to check if traces are being forwarded by agent to the Distributor , also if distributor is forwarding them correctly?

I am trying to deploy Grafana Tempo with below flow:
Synthetic Load generator app ----> Grafana Agent -----> Grafana Tempo ----> Grafana Dashboard
The protocol being used is otlp on HTTP. I am not able to figure out if the traces are being forwarded correctly. The app logs (synthetic load generator) shows that the traces are being emitted but I cannot see them on the grafana dashboard.

Below are the logs from grafana agent::

ts=2022-06-10T20:49:29.150606409Z caller=server.go:195 level=info msg="server listening on addresses" http=127.0.0.1:8080 grpc=127.0.0.1:12346 http_tls_enabled=false grpc_tls_enabled=false
ts=2022-06-10T20:49:29.151289286Z caller=node.go:85 level=info agent=prometheus component=cluster msg="applying config"
ts=2022-06-10T20:49:29Z level=info caller=traces/traces.go:143 msg="Traces Logger Initialized" component=traces
ts=2022-06-10T20:49:29.152656812Z caller=remote.go:180 level=info agent=prometheus component=cluster msg="not watching the KV, none set"
ts=2022-06-10T20:49:29Z level=info caller=traces/instance.go:141 msg="shutting down receiver" component=traces traces_config=default
ts=2022-06-10T20:49:29Z level=info caller=traces/instance.go:141 msg="shutting down processors" component=traces traces_config=default
ts=2022-06-10T20:49:29Z level=info caller=traces/instance.go:141 msg="shutting down exporters" component=traces traces_config=default
ts=2022-06-10T20:49:29Z level=info caller=traces/instance.go:141 msg="shutting down extensions" component=traces traces_config=default
ts=2022-06-10T20:49:29Z level=info caller=builder/exporters_builder.go:255 msg="Exporter was built." component=traces traces_config=default kind=exporter name=otlp/0
ts=2022-06-10T20:49:29Z level=info caller=builder/exporters_builder.go:40 msg="Exporter is starting..." component=traces traces_config=default kind=exporter name=otlp/0
ts=2022-06-10T20:49:29Z level=info caller=builder/exporters_builder.go:48 msg="Exporter started." component=traces traces_config=default kind=exporter name=otlp/0
ts=2022-06-10T20:49:29Z level=info caller=builder/pipelines_builder.go:223 msg="Pipeline was built." component=traces traces_config=default name=pipeline name=traces
ts=2022-06-10T20:49:29Z level=info caller=builder/pipelines_builder.go:54 msg="Pipeline is starting..." component=traces traces_config=default name=pipeline name=traces
ts=2022-06-10T20:49:29Z level=info caller=builder/pipelines_builder.go:65 msg="Pipeline is started." component=traces traces_config=default name=pipeline name=traces
ts=2022-06-10T20:49:29Z level=info caller=builder/receivers_builder.go:226 msg="Receiver was built." component=traces traces_config=default kind=receiver name=otlp datatype=traces
ts=2022-06-10T20:49:29Z level=info caller=builder/receivers_builder.go:226 msg="Receiver was built." component=traces traces_config=default kind=receiver name=push_receiver datatype=traces
ts=2022-06-10T20:49:29Z level=info caller=builder/receivers_builder.go:68 msg="Receiver is starting..." component=traces traces_config=default kind=receiver name=otlp
ts=2022-06-10T20:49:29Z level=info caller=otlpreceiver/otlp.go:87 msg="Starting HTTP server on endpoint 0.0.0.0:55680" component=traces traces_config=default kind=receiver name=otlp
ts=2022-06-10T20:49:29Z level=info caller=builder/receivers_builder.go:73 msg="Receiver started." component=traces traces_config=default kind=receiver name=otlp
ts=2022-06-10T20:49:29Z level=info caller=builder/receivers_builder.go:68 msg="Receiver is starting..." component=traces traces_config=default kind=receiver name=push_receiver
ts=2022-06-10T20:49:29Z level=info caller=builder/receivers_builder.go:73 msg="Receiver started." component=traces traces_config=default kind=receiver name=push_receiver
ts=2022-06-10T20:49:29.160141007Z caller=manager.go:231 level=debug msg="Applying integrations config changes"
ts=2022-06-10T20:49:29.162172836Z caller=manager.go:228 level=debug msg="Integrations config is unchanged skipping apply"
ts=2022-06-10T20:49:29.162340031Z caller=reporter.go:107 level=info msg="running usage stats reporter"
ts=2022-06-10T20:49:44.153454393Z caller=config_watcher.go:139 level=debug agent=prometheus component=cluster msg="waiting for next reshard interval" last_reshard=2022-06-10T20:49:44.153366114Z next_reshard=2022-06-10T20:50:44.153366114Z remaining=59.999990048s
ts=2022-06-10T20:50:44.153971195Z caller=config_watcher.go:106 level=debug agent=prometheus component=cluster msg="reshard timer ticked, scheduling refresh"
ts=2022-06-10T20:50:44.154072092Z caller=config_watcher.go:147 level=debug agent=prometheus component=cluster msg="successfully scheduled a refresh"
ts=2022-06-10T20:50:44.154086993Z caller=config_watcher.go:139 level=debug agent=prometheus component=cluster msg="waiting for next reshard interval" last_reshard=2022-06-10T20:50:44.154080723Z next_reshard=2022-06-10T20:51:44.154080723Z remaining=59.999998816s
ts=2022-06-10T20:50:44.154103766Z caller=config_watcher.go:163 level=debug agent=prometheus component=cluster msg="refresh skipped because clustering is disabled"
ts=2022-06-10T20:50:44.154113467Z caller=config_watcher.go:139 level=debug agent=prometheus component=cluster msg="waiting for next reshard interval" last_reshard=2022-06-10T20:50:44.154080723Z next_reshard=2022-06-10T20:51:44.154080723Z remaining=59.999968604s
ts=2022-06-10T20:51:44.155162883Z caller=config_watcher.go:106 level=debug agent=prometheus component=cluster msg="reshard timer ticked, scheduling refresh"
ts=2022-06-10T20:51:44.155247031Z caller=config_watcher.go:147 level=debug agent=prometheus component=cluster msg="successfully scheduled a refresh"
ts=2022-06-10T20:51:44.155269105Z caller=config_watcher.go:139 level=debug agent=prometheus component=cluster msg="waiting for next reshard interval" last_reshard=2022-06-10T20:51:44.155258172Z next_reshard=2022-06-10T20:52:44.155258172Z remaining=59.999998769s
ts=2022-06-10T20:51:44.155314634Z caller=config_watcher.go:163 level=debug agent=prometheus component=cluster msg="refresh skipped because clustering is disabled"
ts=2022-06-10T20:51:44.155335545Z caller=config_watcher.go:139 level=debug agent=prometheus component=cluster msg="waiting for next reshard interval" last_reshard=2022-06-10T20:51:44.155258172Z next_reshard=2022-06-10T20:52:44.155258172Z remaining=59.999924474s

Logs from distributor pod:

level=info ts=2022-06-10T22:41:36.55767082Z caller=main.go:191 msg="initialising OpenTracing tracer"
level=info ts=2022-06-10T22:41:36.582889977Z caller=main.go:106 msg="Starting Tempo" version="(version=, branch=HEAD, revision=d3880a979)"
level=info ts=2022-06-10T22:41:36.583853283Z caller=server.go:260 http=[::]:3100 grpc=[::]:9095 msg="server listening on addresses"
ts=2022-06-10T22:41:36Z level=info msg="OTel Shim Logger Initialized" component=tempo
level=info ts=2022-06-10T22:41:36.585850483Z caller=memberlist_client.go:394 msg="Using memberlist cluster node name" name=tempo-tempo-distributed-distributor-54664dc5c4-tw9ph-e26671a8
level=info ts=2022-06-10T22:41:36.587323877Z caller=module_service.go:64 msg=initialising module=server
level=info ts=2022-06-10T22:41:36.587553021Z caller=module_service.go:64 msg=initialising module=memberlist-kv
level=info ts=2022-06-10T22:41:36.587613975Z caller=module_service.go:64 msg=initialising module=overrides
level=info ts=2022-06-10T22:41:36.587762766Z caller=module_service.go:64 msg=initialising module=ring
level=info ts=2022-06-10T22:41:36.589117247Z caller=ring.go:272 msg="ring doesn't exist in KV store yet"
level=info ts=2022-06-10T22:41:36.589265478Z caller=module_service.go:64 msg=initialising module=distributor
ts=2022-06-10T22:41:36Z level=info msg="Starting HTTP server on endpoint 0.0.0.0:55681" component=tempo
level=info ts=2022-06-10T22:41:36.589551498Z caller=app.go:284 msg="Tempo started"
level=info ts=2022-06-10T22:41:36.599493812Z caller=memberlist_client.go:513 msg="joined memberlist cluster" reached_nodes=3
level=warn ts=2022-06-10T22:43:02.854960949Z caller=tcp_transport.go:428 component="memberlist TCPTransport" msg="WriteTo failed" addr=10.215.77.101:7946 err="dial tcp 10.215.77.101:7946: i/o timeout"
ts=2022-06-10T22:43:06.590276921Z caller=memberlist_logger.go:74 level=warn msg="Was able to connect to tempo-tempo-distributed-ingester-0-bd5c0f84 but other probes failed, network may be misconfigured"
level=warn ts=2022-06-10T22:43:07.860665534Z caller=tcp_transport.go:428 component="memberlist TCPTransport" msg="WriteTo failed" addr=10.215.77.101:7946 err="dial tcp 10.215.77.101:7946: i/o timeout"
level=warn ts=2022-06-10T22:43:09.59238309Z caller=tcp_transport.go:428 component="memberlist TCPTransport" msg="WriteTo failed" addr=10.215.77.101:7946 err="dial tcp 10.215.77.101:7946: i/o timeout"
level=warn ts=2022-06-10T22:43:11.5905154Z caller=tcp_transport.go:428 component="memberlist TCPTransport" msg="WriteTo failed" addr=10.215.77.101:7946 err="dial tcp 10.215.77.101:7946: i/o timeout"
ts=2022-06-10T22:43:11.594984883Z caller=memberlist_logger.go:74 level=info msg="Suspect tempo-tempo-distributed-distributor-67f78d648f-gcptg-4ecef0db has failed, no acks received"
level=warn ts=2022-06-10T22:43:12.864858192Z caller=tcp_transport.go:428 component="memberlist TCPTransport" msg="WriteTo failed" addr=10.215.77.101:7946 err="dial tcp 10.215.77.101:7946: i/o timeout"
level=warn ts=2022-06-10T22:43:14.597544152Z caller=tcp_transport.go:428 component="memberlist TCPTransport" msg="WriteTo failed" addr=10.215.77.101:7946 err="dial tcp 10.215.77.101:7946: i/o timeout"
level=warn ts=2022-06-10T22:43:20.174516522Z caller=tcp_transport.go:428 component="memberlist TCPTransport" msg="WriteTo failed" addr=10.215.77.101:7946 err="dial tcp 10.215.77.101:7946: i/o timeout"
ts=2022-06-10T22:43:22.86079914Z caller=memberlist_logger.go:74 level=info msg="Marking tempo-tempo-distributed-distributor-67f78d648f-gcptg-4ecef0db as failed, suspect timeout reached (2 peer confirmations)"
level=warn ts=2022-06-10T22:43:28.590684881Z caller=tcp_transport.go:428 component="memberlist TCPTransport" msg="WriteTo failed" addr=10.215.77.101:7946 err="dial tcp 10.215.77.101:7946: i/o timeout"
ts=2022-06-10T22:43:31.58925098Z caller=memberlist_logger.go:74 level=warn msg="Was able to connect to tempo-tempo-distributed-querier-6c5955ffc5-rpcb6-da45d3d3 but other probes failed, network may be misconfigured"
level=warn ts=2022-06-10T22:43:42.591779235Z caller=tcp_transport.go:428 component="memberlist TCPTransport" msg="WriteTo failed" addr=10.215.77.101:7946 err="dial tcp 10.215.77.101:7946: i/o timeout"
level=warn ts=2022-06-10T22:43:47.592646516Z caller=tcp_transport.go:428 component="memberlist TCPTransport" msg="WriteTo failed" addr=10.215.77.101:7946 err="dial tcp 10.215.77.101:7946: i/o timeout"
level=warn ts=2022-06-10T22:43:53.589591088Z caller=tcp_transport.go:428 component="memberlist TCPTransport" msg="WriteTo failed" addr=10.215.77.101:7946 err="dial tcp 10.215.77.101:7946: i/o timeout"
level=warn ts=2022-06-10T22:44:17.18557509Z caller=tcp_transport.go:428 component="memberlist TCPTransport" msg="WriteTo failed" addr=10.215.77.101:7946 err="dial tcp 10.215.77.101:7946: i/o timeout"

I am not able to figure out what the issue is ? Can you please suggest what I can do to further troubleshoot the issue ?

Thank you.

Please share the solution on this issue if you have solved this. I am facing the same issue currently.

Hi!

I would start by checking if the Agent is receiving and forwarding spans correctly. For this, you can check traces_receiver_accepted_spans, traces_receiver_refused_spans to see if traces are being received. Also, check traces_exporter_sent_spans and traces_exporter_send_failed_spans to verify traces are being sent to Tempo.

If the Agent is working correctly, you can move to Tempo. Check tempo_distributor_bytes_received_total or enable the config option log_received_traces in the distributor to see if Tempo is ingesting traces.

Any additional information regarding your setup like Grafana Agent and Tempo version would be of help debugging this. Thanks!

Hi @mariorodriguez !

Thank you for looking into this. I am using grafana agent version - v0.25.0 and tempo version - v1.4.1. Let me provide more details on what I implemented at my end.

I have a few questions on how I can find the metrics (for grafana and tempo) that you mentioned in your previous comment.

  • How can I access the metrics for the grafana agent ? Is there any documentation that I can refer to. I am deploying the agent as a daemon set in AWS EKS cluster.

  • I am deploying tempo as a distributed system, so each component is deployed as a deployment in the cluster. So how can I access the metrics for tempo ? As per the documentation, we need to access - http://:/metrics . But what would be the tempo-address in microservice architecture ? Is it the address of the distributor component ?

  • My goal is to deploy Grafana tempo with the grafana agent in AWS EKS cluster.

  • I am using the following yaml file to deploy grafana agent as a daemon set:

apiVersion: v1
kind: ServiceAccount
metadata:
  name: grafana-agent-traces
---
apiVersion: v1
data:
  agent.yaml: |
   server:
      http_listen_port: 8080
      log_level: debug
   traces:
      configs:
      - name: default
        remote_write:
          - endpoint: http://tempo-tempo-distributed-distributor:55681
            insecure: true
        receivers:
          otlp:
            protocols:
              http:
                endpoint: 0.0.0.0:55680
        automatic_logging:
          backend: stdout
          roots: true
kind: ConfigMap
metadata:
  name: grafana-agent-traces
  namespace: dock
---
apiVersion: rbac.authorization.k8s.io/v1beta1
kind: ClusterRole
metadata:
  name: grafana-agent-traces
rules:
- apiGroups:
  - ""
  resources:
  - nodes
  - nodes/proxy
  - services
  - endpoints
  - pods
  verbs:
  - get
  - list
  - watch
- nonResourceURLs:
  - /metrics
  verbs:
  - get
---
apiVersion: rbac.authorization.k8s.io/v1beta1
kind: ClusterRoleBinding
metadata:
  name: grafana-agent-traces
roleRef:
  apiGroup: rbac.authorization.k8s.io
  kind: ClusterRole
  name: grafana-agent-traces
subjects:
- kind: ServiceAccount
  name: grafana-agent-traces
  namespace: dock
---
apiVersion: v1
kind: Service
metadata:
  labels:
    name: grafana-agent-traces
  name: grafana-agent-traces
spec:
  ports:
  - name: agent-http-metrics
    port: 8080
    targetPort: 8080
  - name: agent-tempo-jaeger-thrift-compact
    port: 6831
    protocol: UDP
    targetPort: 6831
  - name: agent-tempo-jaeger-thrift-binary
    port: 6832
    protocol: UDP
    targetPort: 6832
  - name: agent-tempo-jaeger-thrift-http
    port: 14268
    protocol: TCP
    targetPort: 14268
  - name: agent-tempo-jaeger-grpc
    port: 14250
    protocol: TCP
    targetPort: 14250
  - name: agent-tempo-zipkin
    port: 9411
    protocol: TCP
    targetPort: 9411
  - name: agent-tempo-otlp
    port: 55680
    protocol: TCP
    targetPort: 55680
  - name: agent-tempo-opencensus
    port: 55678
    protocol: TCP
    targetPort: 55678
  selector:
    name: grafana-agent-traces
---
apiVersion: apps/v1
kind: DaemonSet
metadata:
  name: grafana-agent-traces
  namespace: dock
spec:
  minReadySeconds: 10
  selector:
    matchLabels:
      name: grafana-agent-traces
  template:
    metadata:
      labels:
        name: grafana-agent-traces
    spec:
      affinity:
        nodeAffinity:
          requiredDuringSchedulingIgnoredDuringExecution:
            nodeSelectorTerms:
            - matchExpressions:
              - key: namespace
                operator: In
                values:
                - dock
      containers:
      - args:
        - -config.file=/etc/agent/agent.yaml
        env:
        - name: HOSTNAME
          valueFrom:
            fieldRef:
              fieldPath: spec.nodeName
        image: grafana/agent:v0.25.0
        imagePullPolicy: IfNotPresent
        name: agent
        ports:
        - containerPort: 8080
          name: http-metrics
        - containerPort: 6831
          name: tft-compact
          protocol: UDP
        - containerPort: 6832
          name: tft-binary
          protocol: UDP
        - containerPort: 14268
          name: tft-http
          protocol: TCP
        - containerPort: 14250
          name: jaeger-grpc
          protocol: TCP
        - containerPort: 9411
          name: zipkin
          protocol: TCP
        - containerPort: 55680
          name: otlp
          protocol: TCP
        - containerPort: 55678
          name: opencensus
          protocol: TCP
        volumeMounts:
        - mountPath: /etc/agent
          name: grafana-agent-traces
      serviceAccount: grafana-agent-traces
      tolerations:
      - effect: NoSchedule
        operator: Exists
      volumes:
      - configMap:
          name: grafana-agent-traces
        name: grafana-agent-traces
  updateStrategy:
    type: RollingUpdate

I have made some modifications ( for example I am using otlp-http protocol instead of jaeger protocol.

Change made in microservices-extras.yaml is the endpoint to emit traces to for the load geenrator:

    spec:
      containers:
      - env:
        - name: JAEGER_COLLECTOR_URL
          value: http://grafana-agent-traces:55680
        - name: TOPOLOGY_FILE
          value: /conf/load-generator.json
        image: omnition/synthetic-load-generator:1.0.25
        imagePullPolicy: IfNotPresent
        name: synthetic-load-gen
        volumeMounts:
        - mountPath: /conf
          name: conf

Change in microservices-grafana-values.yaml:

    datasources:
      - name: Tempo
        type: tempo
        access: proxy
        orgId: 1
        url: http://tempo-tempo-distributed-query-frontend:3100
        basicAuth: false
        isDefault: true
        version: 1
        editable: false
        apiVersion: 1
        uid: tempo

Changes in microservices-tempo-values.yaml:

traces:
  otlp:
    http:
      enabled: true
config: |
  query_frontend:
    search:
      max_duration: 0
  multitenancy_enabled: false
  search_enabled: true
  compactor:
    compaction:
      block_retention: 1440h
  distributor:
    receivers:
      otlp:
        protocols:
          http:
            endpoint: 0.0.0.0:55681

Hey! Apologies for the delay.

Metric auto-instrumentation is handled via integrations in the Agent now. You will need to enable the Agent integration and select a metrics’ instance to forward the metrics. The config could look something like this:

metrics:
  global:
    remote_write:
      - url: http://prometheus:9090/api/v1/write
  configs:
    - name: default

integrations:
  agent:
    enabled: true
    instance: default

Every component will expose a /metrics endpoint from where to scrape metrics. If you’re running Tempo in microservices architecture, you will need to scrape all components. Scrape configs in Prometheus could look something like this:

scrape_configs:
  - job_name: 'tempo'
    static_configs:
      - targets:
        - 'compactor:3200'
        - 'distributor:3200'
        - 'ingester-0:3200'
        - 'ingester-1:3200'
        - 'ingester-2:3200'
        - 'querier:3200'
        - 'query-frontend:3200'
        - 'metrics-generator:3200'

Please note that :55680 is the old and deprecated port for OTLP. The new default port is 4317.

That won’t work. The synthetic-load-generator is instrumented with Jaeger, which exports using the HTTP Thrift Jaeger protocol. You will need to enable that receiver in the Agent in order to ingest the data. Most likely, if you take a look at the synthetic-load-generator logs you will find error logs about being unable to send data to the configured endpoint.

Also, the Grafana Agent exports traces in OTLP gRPC by default. To use OTLP HTTP, you will need to specify protocol: http in the remote_write config. Refer to the docs for more config options.

@mariorodriguez Thank you for the help ! I was able to get it working with the information that you shared. I changed the grafana agent to receiver to jaeger http and the distributor to receive in the otlp grpc fomat.

1 Like