I’ve figured out the issue and wanted to share my findings!
After reviewing my test setup and comparing it with a similar problem I experienced (which I detailed in this post), I found that the root cause was related to incorrectly set tags. Specifically, my API requests were being split into thousands of unique entities because of dynamic URLs, which led to incorrect quantile calculations. Once I corrected the tags by adding a consistent name tag for each API, the quantiles started working properly in Grafana.
Regarding the p90/p95 calculations, I found that the Last* calculation method is the correct way to match the console output with the dashboard. The k6 Prometheus dashboard (available here) uses the Mean by default for quantile values, which does not correlate with the quantiles in the console output. Is there a reason why Mean is used in this dashboard for quantile calculations, or could this be something that should be updated?