I have 1 query frontend deployed on a vm with 2 queriers each on a separate vm pulling queries from the frontend. Queries are executed in grafana.
When executing queries (refreshing a dashboard) in a small time-range everything works fine, however at around a 12 hours time-range after a while i’m getting a 502 bad gateway.
Values I tried to edit:
In loki:
querier:
query_timeout: 10m
engine:
timeout: 10m
I’m not sure if this is a grafana or loki problem, but I do think it’s grafana related since the 502 isn’t sent from the query-frontend. The querier-frontend gets a tcp write error.
I was able to solve it, it was a timeout on both ends sometimes Grafana and sometimes Loki,depending on the configurations I changed.
I never ran the configurations at both Grafana and Loki always seeing no affect made me think the configuration is useless, turns out you need all of them.
@melnikovpetr123 I’m curious. We’re needing to tweak our implementation to solve the same problem. Your changes look promising to test so I’m attempting to do so. However, our loki change does not give enough context for me to determine where to put the server.<property> attributes for distributed loki. Do you happen to know which parent component server belongs? Hopefully, my question makes sense with respect to the term “distributed” loki. Please let me know of I can further articulate.
hey @snafux , tbh i didn’t get how does your setup look like. in the message above i provided the values we use for the corresponding helm charts.
this is how they get translated to parameters (specifically for loki): /usr/bin/loki -config.file=/etc/loki/loki.yaml -querier.engine.timeout=5m -querier.parallelise-shardable-queries=false -querier.query-timeout=5m -server.http-read-timeout=5m -server.http-write-timeout=5m