Hello,
I just set up a /metrics endpoint for one of my services in my k8s cluster. Short info: Everything is set up via terraform, there are multiple applications running in the cluster, the monitoring stack consists of :
- Grafana,
- Grafana Agent,
- Mimir.
Now everything works fine for all services but one (there’s a PodMonitor on top of each application and the PM is being scraped by Grafana Agents). When I curl the service’s /metrics endpoint manually everything looks normal so far, i.e.:
http_request_duration_seconds_count{path=“GET /health”,status_code=“200”} 187
http_request_duration_seconds_count{path=“GET /metrics”,status_code=“200”} 1
http_request_duration_seconds_count{path=“GET /test”,status_code=“200”} 2
The health endpoint counter seems to increase due to the probes and the metrics endpoint due to my request. However, when I go to Grafana and query
http_request_duration_seconds_count{container="test_app"}
where “test_app” is the name of the application, I get totally messed up results, e.g.
http_request_duration_seconds_count{container=“test-app”, endpoint=“port-15000”, …, path=“GET /health”, status_code=“200”} → Value = 12
http_request_duration_seconds_count{container=“test-app”, endpoint=“port-15000”, …, path=“GET /metrics”, status_code=“200”} → Value = 154
There’s exactly one pod running for this application, the framework within is PHP Laravel Swoole (which could mess up the results due to workers I guess, but that doesn’t explain the high value for /metrics path).
The /test route is completely missing.
The pod names in the labels returned from Mimir/Grafana are the correct Pods. Since it’s working fine for all the other services (NodeJS & NestJS). I’m totally lost now.
Any idea on what’s causing this issue or how to debug this?
Thanks in advance!