Hi there, I’m looking for some help diagnosing a timeout issue in our production Grafana.
We are running Grafana version 6.1.6, running on OpenShift 3.11. We are using this image: grafana/grafana:6.1.6
. We are pulling data from Thanos, which exposes a Prometheus compatible API.
We are consistently hitting a timeout in Grafana when querying this datasource after 30 seconds. The user is met with an error message that says 504 Gateway Time-out The server didn't respond in time.
In the Grafana pod logs we see corresponding log lines like http: proxy error: context canceled
.
In /etc/grafana/grafana.ini we have this block set to configure the datasource timeout:
[dataproxy]
timeout = 240
We suspected the OpenShift route could be timing out as described in this GitHub issue, however we have confirmed that that is not the case by directly querying the Thanos API, executing the same queries that fail in Grafana. For reference, the query responds after about 5 minutes (well after the 30 second timeout we’re seeing).
It appears that there is some timeout configured somewhere in Grafana that we are hitting, but we’re at a loss for where it might be. Any suggestions?