Grafana timing out when querying Prometheus datasource

Hi there, I’m looking for some help diagnosing a timeout issue in our production Grafana.

We are running Grafana version 6.1.6, running on OpenShift 3.11. We are using this image: grafana/grafana:6.1.6. We are pulling data from Thanos, which exposes a Prometheus compatible API.

We are consistently hitting a timeout in Grafana when querying this datasource after 30 seconds. The user is met with an error message that says 504 Gateway Time-out The server didn't respond in time. In the Grafana pod logs we see corresponding log lines like http: proxy error: context canceled.

In /etc/grafana/grafana.ini we have this block set to configure the datasource timeout:

timeout = 240

We suspected the OpenShift route could be timing out as described in this GitHub issue, however we have confirmed that that is not the case by directly querying the Thanos API, executing the same queries that fail in Grafana. For reference, the query responds after about 5 minutes (well after the 30 second timeout we’re seeing).

It appears that there is some timeout configured somewhere in Grafana that we are hitting, but we’re at a loss for where it might be. Any suggestions?


We are seeing the same issue with TSDB datasource, no matter what timeout we configure in grafana.ini. the query times out after 15 seconds. Could someone help diagnose the issue?

Our grafana server is not behind proxy, the query takes ~20 seconds to complete when executed directly in TSDB, but grafana times out after 15 seconds

t=2021-08-05T17:11:00+0000 lvl=eror msg="Data proxy error" logger=data-proxy-log userId=1 orgId=2 uname=admin path=/api/datasources/proxy/1/api/query remote_addr= referer="" error="http: proxy error: context canceled"
t=2021-08-05T17:11:00+0000 lvl=eror msg="Request Completed" logger=context userId=1 orgId=2 uname=admin method=POST path=/api/datasources/proxy/1/api/query status=502 remote_addr= time_ms=15067 size=0 referer=""