No metrics being generated

I’m using an OSS stack, testing in development, using Docker. I have traces being generated successfully and can successfully query these within Grafana. However, I cannot get metrics generation, or service graphs working for the life of me. Any help is much appreciated!

I have Tempo configured to generate metrics thusly:

tempo.yaml

...

metrics_generator:
  processor:
    service_graphs:
    span_metrics:
  registry:
    external_labels:
      source: tempo
      cluster: docker-compose
  storage:
    path: /var/tempo/generator/wal
    wal:
    remote_write_flush_deadline: 30s
    remote_write:
      - url: http://host.docker.internal:9090/api/v1/write
        send_exemplars: true
  traces_storage:
    path: /var/tempo/generator/traces

overrides:
  defaults:
    metrics_generator:
      processors: [service-graphs, span-metrics, local-blocks]
...

And in the logs, I can see the following:

$ docker-compose logs -f tempo

tempo-1  | level=info ts=2024-05-29T16:42:27.186169918Z caller=registry.go:236 tenant=single-tenant msg="collecting metrics" active_series=0
tempo-1  | level=info ts=2024-05-29T16:42:42.1872593Z caller=registry.go:236 tenant=single-tenant msg="collecting metrics" active_series=0
tempo-1  | level=info ts=2024-05-29T16:42:57.186681168Z caller=registry.go:236 tenant=single-tenant msg="collecting metrics" active_series=0
tempo-1  | level=info ts=2024-05-29T16:43:12.187923717Z caller=registry.go:236 tenant=single-tenant msg="collecting metrics" active_series=0
tempo-1  | level=info ts=2024-05-29T16:43:27.187517543Z caller=registry.go:236 tenant=single-tenant msg="collecting metrics" active_series=0
tempo-1  | level=info ts=2024-05-29T16:43:42.187359467Z caller=registry.go:236 tenant=single-tenant msg="collecting metrics" active_series=0
tempo-1  | level=info ts=2024-05-29T16:43:57.188085252Z caller=registry.go:236 tenant=single-tenant msg="collecting metrics" active_series=0
tempo-1  | level=info ts=2024-05-29T16:44:12.188117259Z caller=registry.go:236 tenant=single-tenant msg="collecting metrics" active_series=0
tempo-1  | level=info ts=2024-05-29T16:44:27.187242793Z caller=registry.go:236 tenant=single-tenant msg="collecting metrics" active_series=0
tempo-1  | level=info ts=2024-05-29T16:44:42.187321925Z caller=registry.go:236 tenant=single-tenant msg="collecting metrics" active_series=0
tempo-1  | level=info ts=2024-05-29T16:44:57.18904496Z caller=registry.go:236 tenant=single-tenant msg="collecting metrics" active_series=0
tempo-1  | level=info ts=2024-05-29T16:45:12.188460425Z caller=registry.go:236 tenant=single-tenant msg="collecting metrics" active_series=0
tempo-1  | level=info ts=2024-05-29T16:45:27.186341627Z caller=registry.go:236 tenant=single-tenant msg="collecting metrics" active_series=0
tempo-1  | level=info ts=2024-05-29T16:45:36.611407423Z caller=poller.go:136 msg="blocklist poll complete" seconds=0.00014825
tempo-1  | level=info ts=2024-05-29T16:45:42.191599509Z caller=registry.go:236 tenant=single-tenant msg="collecting metrics" active_series=0
tempo-1  | level=info ts=2024-05-29T16:45:57.185562127Z caller=registry.go:257 tenant=single-tenant msg="deleted stale series" active_series=0
tempo-1  | level=info ts=2024-05-29T16:45:57.186138252Z caller=registry.go:236 tenant=single-tenant msg="collecting metrics" active_series=0
tempo-1  | level=info ts=2024-05-29T16:46:12.187804967Z caller=registry.go:236 tenant=single-tenant msg="collecting metrics" active_series=0
tempo-1  | level=info ts=2024-05-29T16:46:27.188579127Z caller=registry.go:236 tenant=single-tenant msg="collecting metrics" active_series=0
tempo-1  | level=info ts=2024-05-29T16:46:42.186421217Z caller=registry.go:236 tenant=single-tenant msg="collecting metrics" active_series=0
tempo-1  | level=info ts=2024-05-29T16:46:57.187713293Z caller=registry.go:236 tenant=single-tenant msg="collecting metrics" active_series=0
tempo-1  | level=info ts=2024-05-29T16:47:12.191388217Z caller=registry.go:236 tenant=single-tenant msg="collecting metrics" active_series=0
tempo-1  | level=info ts=2024-05-29T16:47:27.189738502Z caller=registry.go:236 tenant=single-tenant msg="collecting metrics" active_series=0
tempo-1  | level=info ts=2024-05-29T16:47:42.186984175Z caller=registry.go:236 tenant=single-tenant msg="collecting metrics" active_series=0
tempo-1  | level=info ts=2024-05-29T16:47:57.187703418Z caller=registry.go:236 tenant=single-tenant msg="collecting metrics" active_series=0
tempo-1  | level=info ts=2024-05-29T16:48:12.19496105Z caller=registry.go:236 tenant=single-tenant msg="collecting metrics" active_series=0

But nothing is sent to Prometheus, and what I find odd is that the WAL is empty and zero bytes:

/var/tempo/generator # tree -ah
[4.0K]  .
β”œβ”€β”€ [4.0K]  traces
β”‚   └── [4.0K]  single-tenant
β”‚       β”œβ”€β”€ [4.0K]  blocks
β”‚       β”‚   └── [4.0K]  single-tenant
β”‚       β”‚       β”œβ”€β”€ [4.0K]  13a28e4a-d15e-45a1-a841-1414e8e59e57
β”‚       β”‚       β”‚   β”œβ”€β”€ [100K]  bloom-0
β”‚       β”‚       β”‚   β”œβ”€β”€ [ 20K]  data.parquet
β”‚       β”‚       β”‚   β”œβ”€β”€ [  42]  index
β”‚       β”‚       β”‚   └── [ 414]  meta.json
β”‚       β”‚       β”œβ”€β”€ [4.0K]  7d9a5da1-51ec-4b45-97b5-5c018848d727
β”‚       β”‚       β”‚   β”œβ”€β”€ [100K]  bloom-0
β”‚       β”‚       β”‚   β”œβ”€β”€ [ 22K]  data.parquet
β”‚       β”‚       β”‚   β”œβ”€β”€ [  42]  index
β”‚       β”‚       β”‚   └── [ 415]  meta.json
β”‚       β”‚       β”œβ”€β”€ [4.0K]  ee59dd30-075c-4dfa-ae17-dc2c7192d5f3
β”‚       β”‚       β”‚   β”œβ”€β”€ [100K]  bloom-0
β”‚       β”‚       β”‚   β”œβ”€β”€ [ 21K]  data.parquet
β”‚       β”‚       β”‚   β”œβ”€β”€ [  42]  index
β”‚       β”‚       β”‚   └── [ 415]  meta.json
β”‚       β”‚       └── [4.0K]  f4bfddf1-e9a6-463e-b33f-72ed561d3836
β”‚       β”‚           β”œβ”€β”€ [100K]  bloom-0
β”‚       β”‚           β”œβ”€β”€ [ 22K]  data.parquet
β”‚       β”‚           β”œβ”€β”€ [  42]  index
β”‚       β”‚           └── [ 415]  meta.json
β”‚       └── [4.0K]  e296a42a-0dfa-4dd6-9581-a085f3037f6d+single-tenant+vParquet3
β”‚           └── [   0]  0000000001
└── [4.0K]  wal
    └── [4.0K]  single-tenant
        β”œβ”€β”€ [   0]  lock
        └── [4.0K]  wal
            └── [   0]  00000000

Here’s an example trace: Trace-f33d2e-2024-05-29 18_00_08.json - Google Drive

I’ve trawled the docs, forums, Github, and anywhere else I can find that mentions issues with the metrics, but I’m continuing to bang my head against this. What is most frustrating is that, at some point, I managed to get this working and saw metrics within Prometheus, such as traces_spanmetrics_size_total, but this was short-lived, and I can’t figure out why this started working, or equally why it stopped so abruptly.

Please help!

1 Like

Hi, you need to enable the processors for your tenants. Processors are disabled by default and can be enabled individually per tenant. Also briefly mentioned here.

There are two ways to achieve this:

  • set the metrics_generator.processors override for your tenant:
overrides:
  "my-tenant":
    metrics_generator_processors:
      - span-metrics
      - service-graphs
    # any other overrides you want to set...
  • set default overrides (this will be used for all tenants that don’t have overrides in the runtime overrides):
overrides:
  defaults:
    metrics_generator:
      processors: [service-graphs, span-metrics]
1 Like

Thanks, @koenraad, for your help, both here and on the Slack channel. I had already added the overrides section. What worked was rewriting the configuration from scratch. However, I’m sorry to say that I don’t know exactly why.

For those who find their way here with a similar problem, I think the most likely explanation is one from @koenraad:

We have an endpoint /status/overrides and /status/overrides/{tenant} which will show the actual overrides used for your tenant. It could be that the default overrides were overridden by runtime overrides.