Increasing timeout query to loki datasource

  • What Grafana version and what operating system are you using?
    8.3.3

  • What are you trying to achieve?
    Higher query timeout to loki data source, more than 10s
    to show full data graph in grafana, that enabled by fullRangeLogsVolume feature toggles

  • How are you trying to achieve it?
    Increasing dataproxy timeout configuration

  • What happened?
    The query still timeout in 10s

  • What did you expect to happen?
    The full data graph in grafana can be shown, with more than 10s timeout

  • Can you copy/paste the configuration(s) that you are having problems with?
    grafana.ini:
    dataproxy:
    logging: true
    timeout: 600
    keep_alive_seconds: 600
    dialTimeout: 600
    tls_handshake_timeout_seconds: 600

  • Did you receive any errors in the Grafana UI or in related logs? If so, please tell us exactly what they were.
    In grafana UI

Error log in Grafana

t=2022-01-06T09:34:00+0000 lvl=eror msg="Data proxy error" logger=data-proxy-log userId=1 orgId=1 uname=devops path=/api/datasources/proxy/1/loki/api/v1/query_range remote_addr="111.94.8.212, 34.102.200.171" referer="https://logging-grafana.dev.kmklabs.com/explore?orgId=1&left=%5B%22now-2d%22,%22now%22,%22Loki%22,%7B%22refId%22:%22A%22,%22expr%22:%22%7Bnamespace%3D%5C%22banzai%5C%22%7D%22%7D%5D" error="http: proxy error: context canceled"
t=2022-01-06T09:34:00+0000 lvl=eror msg="Request Completed" logger=context userId=1 orgId=1 uname=devops method=GET path=/api/datasources/proxy/1/loki/api/v1/query_range status=502 remote_addr="111.94.8.212, 34.102.200.171" time_ms=15049 size=0 referer="https://logging-grafana.dev.kmklabs.com/explore?orgId=1&left=%5B%22now-2d%22,%22now%22,%22Loki%22,%7B%22refId%22:%22A%22,%22expr%22:%22%7Bnamespace%3D%5C%22banzai%5C%22%7D%22%7D%5D"
  • Did you follow any online instructions? If so, what is the URL?
    No

Hi @danielfablius and welcome to the community!

This is a good question. There are a few places along the way where this timeout could be happening.

A few configuration options to check in Grafana and Loki:

 querier:
     engine:
         timeout: 5m
     query_timeout: 5m
  • Try a longer timeout for Loki datasource in the datasource configuration. (This option was added in 8.0). This setting overrides dataproxy.timeout for an individual HTTP datasource.

Screen Shot 2022-01-06 at 3.51.11 PM

1 Like

Hi @meloriarellano1

thanks for your reply

I have tried those configurations, except for the datasource timeout (becuase i already use the dataproxy timeout config)
and it still give the same error

for more information, i use loki-simple-scalable charts, to setup my grafana-loki stack.
so there is an nginx as a gateway between the grafana and loki.
here is the nginx log, when the timeout happened

10.252.48.145 - - [07/Jan/2022:03:30:00 +0000]  499 "GET /loki/api/v1/query_range?direction=BACKWARD&limit=1805&query=sum%20by%20(level)%20(count_over_time(%7Bnamespace%3D%22banzai%22%7D%5B1h%5D))&start=1641353385548000000&end=1641526185548000000&step=3600 HTTP/1.1" 0 "-" "Grafana/8.3.3" "111.94.8.212, 34.102.200.171, 130.211.2.184, 130.211.2.184"

it shows 499 response code, as far as i know, this indicate the connection terminated by the client (grafana), so i was thinking it should be dataproxy_timeout that affect this, but it doesn’t have any effect at all after i try to change the config.

here is my complete grafana.ini

[analytics]
check_for_updates = true
[auth.google]
allow_sign_up = true
auth_url = https://accounts.google.com/o/oauth2/auth
client_id = **
client_secret = **
enabled = true
scopes = https://www.googleapis.com/auth/userinfo.profile https://www.googleapis.com/auth/userinfo.email
token_url = https://accounts.google.com/o/oauth2/token
[dataproxy]
logging = true
timeout = 600
[feature_toggles]
enable = fullRangeLogsVolume
[grafana_net]
url = https://grafana.net
[log]
mode = console
[paths]
data = /var/lib/grafana/
logs = /var/log/grafana
plugins = /var/lib/grafana/plugins
provisioning = /etc/grafana/provisioning
[server]
root_url = https://logging-grafana.dev.kmklabs.com
[users]
auto_assign_org = true
auto_assign_org_role = Editor

and, here is my loki config

auth_enabled: false
common:
  path_prefix: /var/loki
  replication_factor: 1
  ring:
    kvstore:
      store: memberlist
  storage:
    gcs:
      bucket_name: infra-logs-dev-loki
limits_config:
  enforce_metric_name: false
  max_cache_freshness_per_query: 10m
  reject_old_samples: true
  reject_old_samples_max_age: 168h
memberlist:
  join_members:
  - 'loki-loki-loki-simple-scalable-memberlist'
querier:
  engine:
    timeout: 10m
  query_timeout: 10m
schema_config:
  configs:
  - chunks:
      period: 24h
      prefix: chunk_loki_
    from: "2021-11-22"
    index:
      period: 24h
      prefix: index_loki_
    object_store: gcs
    schema: v11
    store: boltdb-shipper
server:
  http_listen_port: 3100

is there anything else i can do?

Hi @danielfablius, you could try to query Loki via the API from your Grafana server to see if the request succeeds.

Other users have found other timeouts in proxies or load balancers in the network path were timing out.

There was a bug in an earlier release, but that appears to be resolved before 8.3.3: Datasource: Fix dataproxy timeout should always be applied for outgoing data source HTTP requests

hi @meloriarellano1

i tried the query loki via API

curl -G -s  "http://localhost:8080/loki/api/v1/query_range?start=1641269178&end=1641701178&limit=1000" --data-urlencode 'query={namespace="banzai"}' | jq

and i got the result, but with the 1000 limit query line returned, with the following stat returned as response

"stats": {
      "summary": {
        "bytesProcessedPerSecond": 18811454,
        "linesProcessedPerSecond": 63438,
        "totalBytesProcessed": 802117,
        "totalLinesProcessed": 2705,
        "execTime": 0.042639818
      },
      "store": {
        "totalChunksRef": 946,
        "totalChunksDownloaded": 50,
        "chunksDownloadTime": 0.00420724,
        "headChunkBytes": 0,
        "headChunkLines": 0,
        "decompressedBytes": 802117,
        "decompressedLines": 2705,
        "compressedBytes": 68134622,
        "totalDuplicates": 0
      },
      "ingester": {
        "totalReached": 1,
        "totalChunksMatched": 0,
        "totalBatches": 0,
        "totalLinesSent": 0,
        "headChunkBytes": 0,
        "headChunkLines": 0,
        "decompressedBytes": 0,
        "decompressedLines": 0,
        "compressedBytes": 0,
        "totalDuplicates": 0
      }

so, i guess it works, but it seems its not processing all the log line in the time range i use, since it says in the response totalLinesProcessed: 2705, is this totalLines means log lines? because if it is, then its far off from the real number log lines that supposed to be there, (there are some hundred thousands log lines in that time period)
anything i can do, to make the timeout for my log graph in grafana longer? so i can see the log graph?

Thanks

Hi @danielfablius, I did some digging and found that there’s a hardcoded timeout in the loki datasource when the fullRangeLogs_Volume feature flag is enabled in Grafana. From what I’ve found, it looks like it’s set to 10s (10000ms) while the feature is in beta to protect from long running queries:

1 Like

Thanks for the info @meloriarellano1

so, if we enable the feature toggle fullRangeLogs_Volume, there is no way to increase the timeout then?
if thats the case, i guess i need to wait until this feature no longer in beta?

Hi @danielfablius, another options is to make your query more specific so the query result is smaller and returns within the timeout.