After upgrading our staging environment from Grafana 8 to Grafana 11—while still querying the same Amazon Managed Service for Prometheus (AMP) workspace—dashboard performance has deteriorated sharply. Panels that once rendered in under one second now take 12-15 seconds to load. Query Inspector shows the slowdown is entirely in PromQL execution, not front-end rendering. The issue affects both large, multi-panel dashboards and smaller ones alike. Cloudwatch panels don’t seem to suffer from this slowness.
Any idea how troubleshoot it?
Increase Grafana log level and check debug logs for those queries.
Enable tracing in Grafana and check generated traces for those queries.
Of course, check resource usage on the server where Grafana is running.
Resource utilisation on the pod and on the Grafana RDS is low. Will increase log level and tracing.
Didn’t find anything useful in the logs, we don’t have tracing system. Any other idea?
So it’s a good opportunity to start.
You didn’t provide a reproducible example, resource utilisation, logs, … so any other ideas are a guess.