Loki won't excecute large queries

I have 1 query frontend deployed on a vm with 2 queriers each on a separate vm pulling queries from the frontend. Queries are executed in grafana.

When executing queries (refreshing a dashboard) in a small time-range everything works fine, however at around a 12 hours time-range after a while i’m getting a 502 bad gateway.

Values I tried to edit:

In loki:
querier:
query_timeout: 10m
engine:
timeout: 10m

server:
http_server_read_timeout: 10m
http_server_write_timeout: 10m

In grafana:
[dataproxy]
timeout = 600

I’m not sure if this is a grafana or loki problem, but I do think it’s grafana related since the 502 isn’t sent from the query-frontend. The querier-frontend gets a tcp write error.

1 Like

I was able to solve it, it was a timeout on both ends sometimes Grafana and sometimes Loki,depending on the configurations I changed.

I never ran the configurations at both Grafana and Loki always seeing no affect made me think the configuration is useless, turns out you need all of them.

Hi, @m3r1

I have the same issue, how did you resolve this? Thanks.

1 Like

As I mentioned, look at the grafana configuration:

[dataproxy]
timeout

In grafana 8 you can set the timeout from the UI when configuring the data source.

From Loki i’m not exactly sure what fixed this, look into the following querier configuration:

query_timeout

engine:
  timeout

Thank you @m3r1 , it turns out my issue is not just timeout issue, I have performance issues on loki server, I posted my issue here loki crashed for large queries · Issue #4582 · grafana/loki · GitHub.

1 Like

what worked for me:

loki:
  enabled: true
  extraArgs:
    querier.query-timeout: 5m
    querier.engine.timeout: 5m
    server.http-read-timeout: 5m
    server.http-write-timeout: 5m

grafana

  grafana.ini:
    dataproxy:
      timeout: 300
  additionalDataSources:
    - name: Loki
      type: loki
      access: proxy
      url: http://loki-stack:3100
      version: 1
      jsonData:
        maxLines: 2000
        timeout: 300
1 Like

@melnikovpetr123 I’m curious. We’re needing to tweak our implementation to solve the same problem. Your changes look promising to test so I’m attempting to do so. However, our loki change does not give enough context for me to determine where to put the server.<property> attributes for distributed loki. Do you happen to know which parent component server belongs? Hopefully, my question makes sense with respect to the term “distributed” loki. Please let me know of I can further articulate.

hey @snafux , tbh i didn’t get how does your setup look like. in the message above i provided the values we use for the corresponding helm charts.

this is how they get translated to parameters (specifically for loki):
/usr/bin/loki -config.file=/etc/loki/loki.yaml -querier.engine.timeout=5m -querier.parallelise-shardable-queries=false -querier.query-timeout=5m -server.http-read-timeout=5m -server.http-write-timeout=5m

1 Like

just a side note - configuration of grafana and/or http-proxy (running in front of loki/grafana) also may lead to timeout issues

1 Like

This topic was automatically closed 365 days after the last reply. New replies are no longer allowed.