Excessive rpc errors while querying

Hi there,

I have a Loki/Promtail installation in a Proxmox-LXC running Debian 11 (bullseye):

loki, version 2.9.3 (branch: HEAD, revision: 2535f9bede)
  build user:       root@998f10a08814
  build date:       2023-12-11T19:17:52Z
  go version:       go1.21.3
  platform:         linux/amd64
  tags:             netgo

There is nothing else installed on this system and the configuration is almost default (if I remember correctly I only altered the file paths and configured compaction/retention):

auth_enabled: false

server:
  http_listen_port: 3100
  grpc_listen_port: 9096

common:
  instance_addr: 127.0.0.1
  path_prefix: /home/loki
  storage:
    filesystem:
      chunks_directory: /home/loki/chunks
      rules_directory: /home/loki/rules
  replication_factor: 1
  ring:
    kvstore:
      store: inmemory

query_range:
  results_cache:
    cache:
      embedded_cache:
        enabled: true
        max_size_mb: 100

schema_config:
  configs:
    - from: 2020-10-24
      store: boltdb-shipper
      object_store: filesystem
      schema: v11
      index:
        prefix: index_
		        period: 24h

ruler:
  alertmanager_url: http://localhost:9093

compactor:
  working_directory: /home/loki/retention
  compaction_interval: 10m
  retention_enabled: true
  retention_delete_delay: 2h
  retention_delete_worker_count: 150

limits_config:
  retention_period: 168h

analytics:
  reporting_enabled: false

The Problem: Whenever I start a query, my Loki host logs errors like these:

Jan 18 12:01:57 Loki loki[137]: level=error ts=2024-01-18T12:01:57.725590944Z caller=scheduler_processor.go:158 org_id=fake traceID=202ad36ef2fdce92 msg="error notifying scheduler about finished query" err=EOF addr=127.0.0.1:9096
Jan 18 12:01:57 Loki loki[137]: level=error ts=2024-01-18T12:01:57.725571964Z caller=scheduler_processor.go:252 org_id=fake traceID=202ad36ef2fdce92 frontend=127.0.0.1:9096 msg="error health checking" err="rpc error: code = Canceled desc = context canceled"
Jan 18 12:01:57 Loki loki[137]: level=error ts=2024-01-18T12:01:57.725533694Z caller=scheduler_processor.go:208 org_id=fake traceID=202ad36ef2fdce92 frontend=127.0.0.1:9096 msg="error notifying frontend about finished query" err="rpc error: code = Canceled desc = context canceled"
Jan 18 12:01:57 Loki loki[137]: level=error ts=2024-01-18T12:01:57.725494534Z caller=scheduler_processor.go:106 msg="error processing requests from scheduler" err="rpc error: code = Canceled desc = context canceled" addr=127.0.0.1:9096
Jan 18 12:01:57 Loki loki[137]: level=error ts=2024-01-18T12:01:57.72512618Z caller=retry.go:73 org_id=fake traceID=202ad36ef2fdce92 msg="error processing request" try=0 query="{hostname=\"OMITTED\"} |= \"\" | logfmt | level=\"error\"" err="context canceled"
Jan 18 12:01:57 Loki loki[137]: level=error ts=2024-01-18T12:01:57.725590944Z caller=scheduler_processor.go:158 org_id=fake traceID=202ad36ef2fdce92 msg="error notifying scheduler about finished query" err=EOF addr=127.0.0.1:9096
Jan 18 12:01:57 Loki loki[137]: level=error ts=2024-01-18T12:01:57.725571964Z caller=scheduler_processor.go:252 org_id=fake traceID=202ad36ef2fdce92 frontend=127.0.0.1:9096 msg="error health checking" err="rpc error: code = Canceled desc = context canceled"
Jan 18 12:01:57 Loki loki[137]: level=error ts=2024-01-18T12:01:57.725533694Z caller=scheduler_processor.go:208 org_id=fake traceID=202ad36ef2fdce92 frontend=127.0.0.1:9096 msg="error notifying frontend about finished query" err="rpc error: code = Canceled desc = context canceled"
Jan 18 12:01:57 Loki loki[137]: level=error ts=2024-01-18T12:01:57.725494534Z caller=scheduler_processor.go:106 msg="error processing requests from scheduler" err="rpc error: code = Canceled desc = context canceled" addr=127.0.0.1:9096
Jan 18 12:01:57 Loki loki[137]: level=error ts=2024-01-18T12:01:57.72512618Z caller=retry.go:73 org_id=fake traceID=202ad36ef2fdce92 msg="error processing request" try=0 query="{hostname=\"OMITTED\"} |= \"\" | logfmt | level=\"error\"" err="context canceled"
Jan 18 12:01:57 Loki loki[137]: level=error ts=2024-01-18T12:01:57.515028293Z caller=scheduler_processor.go:106 msg="error processing requests from scheduler" err="rpc error: code = Canceled desc = context canceled" addr=127.0.0.1:9096
Jan 18 12:01:57 Loki loki[137]: level=error ts=2024-01-18T12:01:57.514998132Z caller=scheduler_processor.go:158 org_id=fake traceID=3b5af5c4b2f756d6 msg="error notifying scheduler about finished query" err=EOF addr=127.0.0.1:9096
Jan 18 12:01:57 Loki loki[137]: level=error ts=2024-01-18T12:01:57.514783521Z caller=scheduler_processor.go:106 msg="error processing requests from scheduler" err="rpc error: code = Canceled desc = context canceled" addr=127.0.0.1:9096
Jan 18 12:01:57 Loki loki[137]: level=error ts=2024-01-18T12:01:57.507858872Z caller=retry.go:73 org_id=fake traceID=3b5af5c4b2f756d6 msg="error processing request" try=0 query="{hostname=\"OMITTED\"} |= \"\" | logfmt | level=\"error\"" err="context canceled"
Jan 18 12:01:57 Loki loki[137]: level=error ts=2024-01-18T12:01:57.515028293Z caller=scheduler_processor.go:106 msg="error processing requests from scheduler" err="rpc error: code = Canceled desc = context canceled" addr=127.0.0.1:9096
Jan 18 12:01:57 Loki loki[137]: level=error ts=2024-01-18T12:01:57.514998132Z caller=scheduler_processor.go:158 org_id=fake traceID=3b5af5c4b2f756d6 msg="error notifying scheduler about finished query" err=EOF addr=127.0.0.1:9096
Jan 18 12:01:57 Loki loki[137]: level=error ts=2024-01-18T12:01:57.514783521Z caller=scheduler_processor.go:106 msg="error processing requests from scheduler" err="rpc error: code = Canceled desc = context canceled" addr=127.0.0.1:9096
Jan 18 12:01:57 Loki loki[137]: level=error ts=2024-01-18T12:01:57.507858872Z caller=retry.go:73 org_id=fake traceID=3b5af5c4b2f756d6 msg="error processing request" try=0 query="{hostname=\"OMITTED\"} |= \"\" | logfmt | level=\"error\"" err="context canceled"
Jan 18 12:01:55 Loki loki[137]: level=error ts=2024-01-18T12:01:55.58424511Z caller=scheduler_processor.go:106 msg="error processing requests from scheduler" err="rpc error: code = Canceled desc = context canceled" addr=127.0.0.1:9096
Jan 18 12:01:55 Loki loki[137]: level=error ts=2024-01-18T12:01:55.583807945Z caller=scheduler_processor.go:158 org_id=fake traceID=180441b3eae9ec91 msg="error notifying scheduler about finished query" err=EOF addr=127.0.0.1:9096
Jan 18 12:01:55 Loki loki[137]: level=error ts=2024-01-18T12:01:55.58378826Z caller=scheduler_processor.go:252 org_id=fake traceID=180441b3eae9ec91 frontend=127.0.0.1:9096 msg="error health checking" err="rpc error: code = Canceled desc = context canceled"
Jan 18 12:01:55 Loki loki[137]: level=error ts=2024-01-18T12:01:55.583741492Z caller=scheduler_processor.go:208 org_id=fake traceID=180441b3eae9ec91 frontend=127.0.0.1:9096 msg="error notifying frontend about finished query" err="rpc error: code = Canceled desc = context canceled"
Jan 18 12:01:55 Loki loki[137]: level=error ts=2024-01-18T12:01:55.58369421Z caller=scheduler_processor.go:106 msg="error processing requests from scheduler" err="rpc error: code = Canceled desc = context canceled" addr=127.0.0.1:9096
Jan 18 12:01:55 Loki loki[137]: level=error ts=2024-01-18T12:01:55.581456433Z caller=retry.go:73 org_id=fake traceID=180441b3eae9ec91 msg="error processing request" try=0 query="{hostname=\"OMITTED\"} |= \"\" | logfmt | level=\"error\"" err="context canceled"
Jan 18 12:01:55 Loki loki[137]: level=error ts=2024-01-18T12:01:55.58424511Z caller=scheduler_processor.go:106 msg="error processing requests from scheduler" err="rpc error: code = Canceled desc = context canceled" addr=127.0.0.1:9096		
Jan 18 12:01:55 Loki loki[137]: level=error ts=2024-01-18T12:01:55.583807945Z caller=scheduler_processor.go:158 org_id=fake traceID=180441b3eae9ec91 msg="error notifying scheduler about finished query" err=EOF addr=127.0.0.1:9096
Jan 18 12:01:55 Loki loki[137]: level=error ts=2024-01-18T12:01:55.58378826Z caller=scheduler_processor.go:252 org_id=fake traceID=180441b3eae9ec91 frontend=127.0.0.1:9096 msg="error health checking" err="rpc error: code = Canceled desc = context canceled"
Jan 18 12:01:55 Loki loki[137]: level=error ts=2024-01-18T12:01:55.583741492Z caller=scheduler_processor.go:208 org_id=fake traceID=180441b3eae9ec91 frontend=127.0.0.1:9096 msg="error notifying frontend about finished query" err="rpc error: code = Canceled desc = context canceled"
Jan 18 12:01:55 Loki loki[137]: level=error ts=2024-01-18T12:01:55.58369421Z caller=scheduler_processor.go:106 msg="error processing requests from scheduler" err="rpc error: code = Canceled desc = context canceled" addr=127.0.0.1:9096
Jan 18 12:01:55 Loki loki[137]: level=error ts=2024-01-18T12:01:55.581456433Z caller=retry.go:73 org_id=fake traceID=180441b3eae9ec91 msg="error processing request" try=0 query="{hostname=\"OMITTED\"} |= \"\" | logfmt | level=\"error\"" err="context canceled"

I found a few other threads regarding similar issues but none of them seems to have ever been resolved (if they were even identical).

Does anybody here have an idea why I am getting these errors?

Thank you in advance!

Regards,
Kinnison

Perhaps try removing instance_addr from your configuration and let Loki figure itself out. Depending on how you run the container 127.0.0.1 may or may not work. Also verify that the connectivity actually works from within the container.

Thanks for answering!

I tried your suggestion but without success unfortunately.

The connection from my Grafana host to the Loki host is working in general.
The logs I posted before were actually copied from the Grafana Explore site using data from Loki.
Both hosts are within the same subnet, so firewall rules should not play any role here.

Edit: Earlier I had installed loki on the same host where my grafana instance is running on so the connection between both services was etablished within the same host and yet the same errors were recorded.

Small update and possible solution:

I installed a new LXC using Ubuntu 22.04 instead of Debian 11, copied over all .conf files from the old LXC and the error did not happen yet.
I will observe if that changes when the amount of stored logs increases and queries become larger.

hi, I encountered the same problem, have you solved it?

Hi,

Yes, I think I did “fix” it by installing Loki/Promtail on Ubuntu 22.04 instead of Debian 11 (as strange as it might sound).

Regards,
Kinnison

Hi again,

unfortunately the issue is not solved. Still happening using Ubuntu 22.04. Fewer errors though …

If I understand the error messages correctly, there are connections from one part of loki (querier?) loosing connection to the scheduler residing on the same host, right?

It seems the error appears more often the bigger the queries are in terms of queried time frame.

Is there a timeout parameter that I could try fiddling around with?

I did a lot of internet research regarding this and many others have the same, or at least similar problems with loki and nobody seems to know the exact reason nor a workaround:

Regards,
Kinnison