Metrics of single service in k8s totally messed up

Hello,

I just set up a /metrics endpoint for one of my services in my k8s cluster. Short info: Everything is set up via terraform, there are multiple applications running in the cluster, the monitoring stack consists of :

  • Grafana,
  • Grafana Agent,
  • Mimir.

Now everything works fine for all services but one (there’s a PodMonitor on top of each application and the PM is being scraped by Grafana Agents). When I curl the service’s /metrics endpoint manually everything looks normal so far, i.e.:

http_request_duration_seconds_count{path=“GET /health”,status_code=“200”} 187

http_request_duration_seconds_count{path=“GET /metrics”,status_code=“200”} 1

http_request_duration_seconds_count{path=“GET /test”,status_code=“200”} 2

The health endpoint counter seems to increase due to the probes and the metrics endpoint due to my request. However, when I go to Grafana and query

http_request_duration_seconds_count{container="test_app"}

where “test_app” is the name of the application, I get totally messed up results, e.g.

http_request_duration_seconds_count{container=“test-app”, endpoint=“port-15000”, …, path=“GET /health”, status_code=“200”} → Value = 12

http_request_duration_seconds_count{container=“test-app”, endpoint=“port-15000”, …, path=“GET /metrics”, status_code=“200”} → Value = 154

There’s exactly one pod running for this application, the framework within is PHP Laravel Swoole (which could mess up the results due to workers I guess, but that doesn’t explain the high value for /metrics path).

The /test route is completely missing.

The pod names in the labels returned from Mimir/Grafana are the correct Pods. Since it’s working fine for all the other services (NodeJS & NestJS). I’m totally lost now.

Any idea on what’s causing this issue or how to debug this?

Thanks in advance!