504 Gateway Error when connection to MSSQL Server

We are running Grafana Enterprise on an open shift environment. We have several MSSQL datasources which connect to differnet servers and databases. Everything is working fine, but as soon as a sql query takes longer than 30 seconds we recieve following error:

504 Gateway Time-out: The server didn’t response in time

We have tried increasing the timeout in grafana.ini, we have checked timouts in sql server but we could not solve this issue. if we are running the quiery directly on SQL Server it takes around 45s.

We have read some issues which points to time out of the webserver it self. Does anyone have a similar issue?

We are running Grafana 12.2.1

45s is an internity for a query. I would recommend you fix the issue in mssql

What does the query look like?

You need to fine tune things or have grafana fetch the data from a flattened single table (emulate dw) that has all the values you need and that is prepopulated via a cron job

If these tables you query from are also being used by other systems (oltp) you will face issues

Thank you for the fast reply.

The query we are using is a data quality analysis query which is visualized in a grafana dashboard. It will only be used once a day during backfill operations. That is also why we chose to do it quicky with a query instead of a table which we have to load first.

As described above, the query runs fine in SQL Server Management Studio. We know 45s is not ideal, but it is also not really really bad :wink: also we already improved the query as good as possible.

We hoped there is a setting / configuration where you can configure the time out.. but unfortunately until now we still not know where the time out is coming from. we cannot find anything in the Grafana logs.

I guess:

Of course any networkin involved (Openshift included) may have own timeouts, which you may need reconfigure as well.

We already testet the [dataproxy] settings and set the timeout to 90 with no effect. For clarification, is the dataproxy always used?

#################################### Data proxy ###########################
    [dataproxy]

    # This enables data proxy logging, default is false
    logging = false

    # How long the data proxy waits to read the headers of the response before timing out, default is 30 seconds.
    # This setting also applies to core backend HTTP data sources where query requests use an HTTP client with timeout set.
    timeout = 90

    # How long the data proxy waits to establish a TCP connection before timing out, default is 10 seconds.
    dialTimeout = 10

    # How many seconds the data proxy waits before sending a keepalive request.
    keep_alive_seconds = 30

Can you share the query so we can see what is happening

the query is used to determine the delay between kafka and the database. if i run it directly on sql server (SSMS) it takes around 45 seconds for the last 7 days.

SELECT 
			DATE_BUCKET(MINUTE, ${BucketSize_Min}, ImportDateTimeUtc) AS Bucket
	   ,MAX(DATEDIFF(SECOND, [ReceivedDateTimeUtc], [ImportDateTimeUtc])) AS MaxTotalDelayInSeconds
		 ,AVG(CAST(DATEDIFF(SECOND, [ReceivedDateTimeUtc], [ImportDateTimeUtc]) AS BIGINT)) AS AvgTotalDelayInSeconds
	   ,MAX(DATEDIFF(SECOND, [KafkaTimestamp], [ImportDateTimeUtc])) AS MaxDatabaseDelayInSeconds
		 ,AVG(CAST(DATEDIFF(SECOND, [KafkaTimestamp], [ImportDateTimeUtc]) AS BIGINT)) AS AvgDatabaseDelayInSeconds
FROM [ETP].[Disk].[Quote_${Venue_Account}_Active] WITH (NOLOCK)
WHERE ImportDateTimeUtc BETWEEN $__timeFrom() AND $__timeTo()
GROUP BY DATE_BUCKET(MINUTE, ${BucketSize_Min}, ImportDateTimeUtc)

in grafana i recieve following error:

Sorry for the late response. I posted the query and a screenshot yesterday, but somehow it got into a spam filter and needs to be checked by a staff member…

Nevertheless we have found the issue in OpenShift. The timeout of the ingress has a default value of 30s. and after that it closes the session. So in our case grafana fires the sql query which is running for 45s. But after 30s OpenShift closes the session acitvely and that’s why we recieve the error.

we could solve it with this config:

apiVersion: route.openshift.io/v1
kind: Route
metadata:
  name: myapp
  annotations:
    haproxy.router.openshift.io/timeout: 90s
spec:
  host: myapp.example.com
  to:
    kind: Service
    name: myapp-service
  port:
    targetPort: 8080

The setting can be added under annotations.

haproxy.router.openshift.io/timeout: 90s
1 Like

I mentioned that:

And it may not be only Ingress. Also Service type: LoadBalancer can be used. So generally whole network chain must be configured to support that timeout.

1 Like

yes, and that’s why we have talked with the team responsible for OpenShift. they then analysed it and came up with this solution this morning. Thank you for the hint :slightly_smiling_face: