Loki Queries Timing Out: Client.Timeout exceeded and context canceled

Hi everyone,

We’re running into an issue where Loki is not responding to simple, targeted queries over a short time period (15 minutes).

The Problem

A basic query like {namespace=“xxx”, component=“yyy”} consistently fails.

Errors We’re Seeing

Grafana Frontend Error:

Get "http://loki-gateway...": net/http: request canceled (Client.Timeout exceeded while awaiting headers)

Loki Gateway Logs:

Access logs show HTTP 499 status codes for the query requests.

Querier logs show RPC and scheduler errors:

level=error ts=... component=querier msg="error processing requests from scheduler" err="rpc error: code = Canceled desc = context canceled"

level=error ts=... component=querier org_id=fake msg="error notifying scheduler about finished query" err=EOF

Research & Context

We’ve reviewed some GitHub issues that seem related, especially since we have also occasionally seen Resource exhausted errors:

https://github.com/grafana/loki/issues/6568

https://github.com/grafana/loki/issues/7649

Our setup seems to be struggling despite the queries being simple and for a narrow time window. We are looking for advice on what to investigate next. Could this be a bottleneck in the querier/scheduler, a resource allocation problem, or a specific configuration we should tune?

Any help or suggestions would be greatly appreciated!

How much log do you have and how many loki containers are you running?