Increasing timeout query to loki datasource

danielfablius · January 6, 2022, 9:34am

What Grafana version and what operating system are you using?
8.3.3
What are you trying to achieve?
Higher query timeout to loki data source, more than 10s
to show full data graph in grafana, that enabled by fullRangeLogsVolume feature toggles
How are you trying to achieve it?
Increasing dataproxy timeout configuration
What happened?
The query still timeout in 10s
What did you expect to happen?
The full data graph in grafana can be shown, with more than 10s timeout
Can you copy/paste the configuration(s) that you are having problems with?
grafana.ini:
dataproxy:
logging: true
timeout: 600
keep_alive_seconds: 600
dialTimeout: 600
tls_handshake_timeout_seconds: 600
Did you receive any errors in the Grafana UI or in related logs? If so, please tell us exactly what they were.
In grafana UI

image1790×634 71.3 KB

Error log in Grafana

t=2022-01-06T09:34:00+0000 lvl=eror msg="Data proxy error" logger=data-proxy-log userId=1 orgId=1 uname=devops path=/api/datasources/proxy/1/loki/api/v1/query_range remote_addr="111.94.8.212, 34.102.200.171" referer="https://logging-grafana.dev.kmklabs.com/explore?orgId=1&left=%5B%22now-2d%22,%22now%22,%22Loki%22,%7B%22refId%22:%22A%22,%22expr%22:%22%7Bnamespace%3D%5C%22banzai%5C%22%7D%22%7D%5D" error="http: proxy error: context canceled"
t=2022-01-06T09:34:00+0000 lvl=eror msg="Request Completed" logger=context userId=1 orgId=1 uname=devops method=GET path=/api/datasources/proxy/1/loki/api/v1/query_range status=502 remote_addr="111.94.8.212, 34.102.200.171" time_ms=15049 size=0 referer="https://logging-grafana.dev.kmklabs.com/explore?orgId=1&left=%5B%22now-2d%22,%22now%22,%22Loki%22,%7B%22refId%22:%22A%22,%22expr%22:%22%7Bnamespace%3D%5C%22banzai%5C%22%7D%22%7D%5D"

Did you follow any online instructions? If so, what is the URL?
No

melori.arellano · January 6, 2022, 10:59pm

Hi @danielfablius and welcome to the community!

This is a good question. There are a few places along the way where this timeout could be happening.

A few configuration options to check in Grafana and Loki:

Check the grafana dataproxy timeout in grafana.ini (you already did this one!)
Check the loki query_timeout in loki.yaml. The default is 1m:

 querier:
     engine:
         timeout: 5m
     query_timeout: 5m

Try a longer timeout for Loki datasource in the datasource configuration. (This option was added in 8.0). This setting overrides dataproxy.timeout for an individual HTTP datasource.

Screen Shot 2022-01-06 at 3.51.11 PM

danielfablius · January 7, 2022, 3:37am

Hi @melori.arellano

thanks for your reply

I have tried those configurations, except for the datasource timeout (becuase i already use the dataproxy timeout config)
and it still give the same error

for more information, i use loki-simple-scalable charts, to setup my grafana-loki stack.
so there is an nginx as a gateway between the grafana and loki.
here is the nginx log, when the timeout happened

10.252.48.145 - - [07/Jan/2022:03:30:00 +0000]  499 "GET /loki/api/v1/query_range?direction=BACKWARD&limit=1805&query=sum%20by%20(level)%20(count_over_time(%7Bnamespace%3D%22banzai%22%7D%5B1h%5D))&start=1641353385548000000&end=1641526185548000000&step=3600 HTTP/1.1" 0 "-" "Grafana/8.3.3" "111.94.8.212, 34.102.200.171, 130.211.2.184, 130.211.2.184"

it shows 499 response code, as far as i know, this indicate the connection terminated by the client (grafana), so i was thinking it should be dataproxy_timeout that affect this, but it doesn’t have any effect at all after i try to change the config.

here is my complete grafana.ini

[analytics]
check_for_updates = true
[auth.google]
allow_sign_up = true
auth_url = https://accounts.google.com/o/oauth2/auth
client_id = **
client_secret = **
enabled = true
scopes = https://www.googleapis.com/auth/userinfo.profile https://www.googleapis.com/auth/userinfo.email
token_url = https://accounts.google.com/o/oauth2/token
[dataproxy]
logging = true
timeout = 600
[feature_toggles]
enable = fullRangeLogsVolume
[grafana_net]
url = https://grafana.net
[log]
mode = console
[paths]
data = /var/lib/grafana/
logs = /var/log/grafana
plugins = /var/lib/grafana/plugins
provisioning = /etc/grafana/provisioning
[server]
root_url = https://logging-grafana.dev.kmklabs.com
[users]
auto_assign_org = true
auto_assign_org_role = Editor

and, here is my loki config

auth_enabled: false
common:
  path_prefix: /var/loki
  replication_factor: 1
  ring:
    kvstore:
      store: memberlist
  storage:
    gcs:
      bucket_name: infra-logs-dev-loki
limits_config:
  enforce_metric_name: false
  max_cache_freshness_per_query: 10m
  reject_old_samples: true
  reject_old_samples_max_age: 168h
memberlist:
  join_members:
  - 'loki-loki-loki-simple-scalable-memberlist'
querier:
  engine:
    timeout: 10m
  query_timeout: 10m
schema_config:
  configs:
  - chunks:
      period: 24h
      prefix: chunk_loki_
    from: "2021-11-22"
    index:
      period: 24h
      prefix: index_loki_
    object_store: gcs
    schema: v11
    store: boltdb-shipper
server:
  http_listen_port: 3100

is there anything else i can do?

melori.arellano · January 8, 2022, 12:05am

Hi @danielfablius, you could try to query Loki via the API from your Grafana server to see if the request succeeds.

Other users have found other timeouts in proxies or load balancers in the network path were timing out.

There was a bug in an earlier release, but that appears to be resolved before 8.3.3: Datasource: Fix dataproxy timeout should always be applied for outgoing data source HTTP requests

danielfablius · January 10, 2022, 4:14am

hi @melori.arellano

i tried the query loki via API

curl -G -s  "http://localhost:8080/loki/api/v1/query_range?start=1641269178&end=1641701178&limit=1000" --data-urlencode 'query={namespace="banzai"}' | jq

and i got the result, but with the 1000 limit query line returned, with the following stat returned as response

"stats": {
      "summary": {
        "bytesProcessedPerSecond": 18811454,
        "linesProcessedPerSecond": 63438,
        "totalBytesProcessed": 802117,
        "totalLinesProcessed": 2705,
        "execTime": 0.042639818
      },
      "store": {
        "totalChunksRef": 946,
        "totalChunksDownloaded": 50,
        "chunksDownloadTime": 0.00420724,
        "headChunkBytes": 0,
        "headChunkLines": 0,
        "decompressedBytes": 802117,
        "decompressedLines": 2705,
        "compressedBytes": 68134622,
        "totalDuplicates": 0
      },
      "ingester": {
        "totalReached": 1,
        "totalChunksMatched": 0,
        "totalBatches": 0,
        "totalLinesSent": 0,
        "headChunkBytes": 0,
        "headChunkLines": 0,
        "decompressedBytes": 0,
        "decompressedLines": 0,
        "compressedBytes": 0,
        "totalDuplicates": 0
      }

so, i guess it works, but it seems its not processing all the log line in the time range i use, since it says in the response totalLinesProcessed: 2705, is this totalLines means log lines? because if it is, then its far off from the real number log lines that supposed to be there, (there are some hundred thousands log lines in that time period)
anything i can do, to make the timeout for my log graph in grafana longer? so i can see the log graph?

Thanks

melori.arellano · January 11, 2022, 5:44pm

Hi @danielfablius, I did some digging and found that there’s a hardcoded timeout in the loki datasource when the fullRangeLogs_Volume feature flag is enabled in Grafana. From what I’ve found, it looks like it’s set to 10s (10000ms) while the feature is in beta to protect from long running queries:

github.com

grafana/grafana/blob/41b709d08d14457e1977526020d589d57f53b641/public/app/plugins/datasource/loki/datasource.ts#L74

    
      
          import { queryLogsVolume } from 'app/core/logs_model';
          import config from 'app/core/config';
          import { renderLegendFormat } from '../prometheus/legend';
          
          
export type RangeQueryOptions = DataQueryRequest<LokiQuery> | AnnotationQueryRequest<LokiQuery>;
          export const DEFAULT_MAX_LINES = 1000;
          export const LOKI_ENDPOINT = '/loki/api/v1';
          const NS_IN_MS = 1000000;
          
          
/**
           * Loki's logs volume query may be expensive as it requires counting all logs in the selected range. If such query
           * takes too much time it may need be made more specific to limit number of logs processed under the hood.
           */
          const LOGS_VOLUME_TIMEOUT = 10000;
          
          
const RANGE_QUERY_ENDPOINT = `${LOKI_ENDPOINT}/query_range`;
          const INSTANT_QUERY_ENDPOINT = `${LOKI_ENDPOINT}/query`;
          
          
const DEFAULT_QUERY_PARAMS: Partial<LokiRangeQueryRequest> = {
            direction: 'BACKWARD',
            limit: DEFAULT_MAX_LINES,

danielfablius · January 17, 2022, 1:05am

Thanks for the info @melori.arellano

so, if we enable the feature toggle fullRangeLogs_Volume, there is no way to increase the timeout then?
if thats the case, i guess i need to wait until this feature no longer in beta?

melori.arellano · January 18, 2022, 10:19pm

Hi @danielfablius, another options is to make your query more specific so the query result is smaller and returns within the timeout.

system · January 18, 2023, 10:19pm

This topic was automatically closed 365 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Loki queries within grafana dashboards Grafana Loki	2	2450	August 30, 2023
Grafana timing out when querying Prometheus datasource Prometheus	1	9100	August 5, 2021
What exactly does the setting of dataproxy timeout control Grafana	2	2811	January 11, 2024
Grafana Loki Timeout and 504 Gateway Error on Query Execution Grafana loki , configuration , config-help , kubernetes , grafana	1	869	November 7, 2024
Loki won't excecute large queries Grafana Loki	9	6806	February 29, 2024

Increasing timeout query to loki datasource

Related topics