Hi again,
unfortunately the issue is not solved. Still happening using Ubuntu 22.04. Fewer errors though …
If I understand the error messages correctly, there are connections from one part of loki (querier?) loosing connection to the scheduler residing on the same host, right?
It seems the error appears more often the bigger the queries are in terms of queried time frame.
Is there a timeout parameter that I could try fiddling around with?
I did a lot of internet research regarding this and many others have the same, or at least similar problems with loki and nobody seems to know the exact reason nor a workaround:
opened 04:57PM - 09 Nov 22 UTC
component/loki
Hi,
- Docker version 20.10.17, build 100c701
- Docker Compose version v2.10.… 2
- Ubuntu WSL2
- loki:2.6.1
I am getting a lot of those errors:
```
loki-1 | level=error ts=2022-11-09T16:43:52.196413829Z caller=retry.go:73 org_id=fake msg="error processing request" try=0 err="context canceled"
loki-1 | level=error ts=2022-11-09T16:43:52.196413176Z caller=retry.go:73 org_id=fake msg="error processing request" try=0 err="context canceled"
loki-1 | level=error ts=2022-11-09T16:43:52.196428849Z caller=retry.go:73 org_id=fake msg="error processing request" try=0 err="context canceled"
loki-1 | level=error ts=2022-11-09T16:43:52.196462256Z caller=retry.go:73 org_id=fake msg="error processing request" try=0 err="context canceled"
loki-1 | level=error ts=2022-11-09T16:43:52.196970284Z caller=scheduler_processor.go:182 org_id=fake msg="error notifying frontend about finished query" err="rpc error: code = Canceled desc = context canceled" frontend=0.0.0.0:9096
loki-1 | level=error ts=2022-11-09T16:43:52.197027425Z caller=scheduler_processor.go:137 org_id=fake msg="error notifying scheduler about finished query" err=EOF addr=127.0.0.1:9096
loki-1 | level=error ts=2022-11-09T16:43:52.197058086Z caller=scheduler_processor.go:182 org_id=fake msg="error notifying frontend about finished query" err="rpc error: code = Canceled desc = context canceled" frontend=0.0.0.0:9096
loki-1 | level=error ts=2022-11-09T16:43:52.197084426Z caller=scheduler_processor.go:137 org_id=fake msg="error notifying scheduler about finished query" err=EOF addr=127.0.0.1:9096
loki-1 | level=error ts=2022-11-09T16:43:52.197793087Z caller=scheduler_processor.go:182 org_id=fake msg="error notifying frontend about finished query" err="rpc error: code = Canceled desc = context canceled" frontend=0.0.0.0:9096
loki-1 | level=error ts=2022-11-09T16:43:52.197823878Z caller=scheduler_processor.go:182 org_id=fake msg="error notifying frontend about finished query" err="rpc error: code = Canceled desc = context canceled" frontend=0.0.0.0:9096
loki-1 | level=error ts=2022-11-09T16:43:52.197833438Z caller=scheduler_processor.go:137 org_id=fake msg="error notifying scheduler about finished query" err=EOF addr=127.0.0.1:9096
loki-1 | level=error ts=2022-11-09T16:43:52.197837788Z caller=scheduler_processor.go:137 org_id=fake msg="error notifying scheduler about finished query" err=EOF addr=127.0.0.1:9096
loki-1 | level=error ts=2022-11-09T16:43:52.197860028Z caller=scheduler_processor.go:182 org_id=fake msg="error notifying frontend about finished query" err="rpc error: code = Canceled desc = context canceled" frontend=0.0.0.0:9096
loki-1 | level=error ts=2022-11-09T16:43:52.197879088Z caller=scheduler_processor.go:137 org_id=fake msg="error notifying scheduler about finished query" err=EOF addr=127.0.0.1:9096
```

I found similar issue about related to frontend address, I added it to config, but didnt help
```
frontend:
address: 0.0.0.0
```
Here is my docker-compose.yml
```yml
loki:
image: grafana/loki:2.6.1
user: root
volumes:
- ./Docker/compose/local/loki:/etc/loki
- loki_data:/home/loki/data
ports:
- 3100:3100
- 9096:9096
restart: unless-stopped
command: -config.file=/etc/loki/loki-config.yml
```
loki-config.yml based on complete-local-config.yaml from Docs
```yml
auth_enabled: false
server:
http_listen_port: 3100
grpc_listen_port: 9096
log_level: warn
frontend:
address: 0.0.0.0
common:
path_prefix: /home/loki/data
storage:
filesystem:
chunks_directory: /home/loki/data/chunks
rules_directory: /home/loki/data/rules
replication_factor: 1
ring:
instance_addr: 127.0.0.1
kvstore:
store: inmemory
schema_config:
configs:
- from: 2020-10-24
store: boltdb-shipper
object_store: filesystem
schema: v11
index:
prefix: index_
period: 24h
query_scheduler:
max_outstanding_requests_per_tenant: 1000
```
Any ideas what is causing it?
Thank you
opened 07:50PM - 28 Jan 23 UTC
closed 10:44PM - 14 Mar 23 UTC
component/query-frontend
**Describe the bug**
Loki shows hundreds of errors, reporting connectivity issu… es on Querier and Scheduler:
``` bash
2023-01-28 20:14:45 | level=error ts=2023-01-28T19:14:44.943214073Z caller=scheduler_processor.go:137 org_id=fake msg="error notifying scheduler about finished query" err=EOF addr=127.0.0.1:9096
2023-01-28 20:14:45 | level=error ts=2023-01-28T19:14:44.943190233Z caller=scheduler_processor.go:182 org_id=fake msg="error notifying frontend about finished query" err="rpc error: code = Canceled desc = context canceled" frontend=172.18.0.15:9096
level=error ts=2023-01-28T19:36:23.572333721Z caller=retry.go:73 org_id=fake msg="error processing request" try=0 err="context canceled"
```
I have tried several configurations from Loki side, but whatever I set as config for Loki, I have these errors are reported, which are creating hundreds of errors per minute.
I did minimal tests and I can see port 9096 is up and accepting connections:
```
devuser@4302e28a2288:~$ telnet loki 9096
Trying 172.18.0.15...
Connected to loki.
Escape character is '^]'.
```
From Grafana, I can see logs from Loki + Grafana, but I don´t know if any log is missing, or not shown because these errors.
**To Reproduce**
Steps to reproduce the behavior:
1. Started Loki (SHA or version): [grafana/loki:2.7.2] deployed as Monolithic mode
2. Grafana [v9.3.2 (21c1d14e91)]
3. Docker Version: 20.10.3
I tried several configuration and checked previous issues, whatever configuration I use, I always find this errors reported by Lokl
Loki config from this issue (https://github.com/grafana/loki/issues/4613#issuecomment-1018367471): + s3 minio config
``` yaml
target: all
auth_enabled: false
server:
http_listen_address: 0.0.0.0
http_listen_port: 3100
http_server_read_timeout: 60s # allow longer time span queries
http_server_write_timeout: 60s # allow longer time span queries
grpc_listen_address: 0.0.0.0
grpc_listen_port: 9096
grpc_server_max_recv_msg_size: 33554432 # 32MiB (int bytes), default 4MB
grpc_server_max_send_msg_size: 33554432 # 32MiB (int bytes), default 4MB
log_level: info
common:
path_prefix: /tmp/data/loki
storage:
s3:
endpoint: minio:9000
insecure: true
region: es-patata-1
bucketnames: test-logs
access_key_id: root
secret_access_key: SuperPassword1
s3forcepathstyle: true
# storage:
# filesystem:
# chunks_directory: /tmp/data/loki/chunks
# rules_directory: /tmp/data/loki/rules
replication_factor: 1
ring:
instance_addr: 127.0.0.1
kvstore:
store: inmemory
schema_config:
configs:
- from: 2020-11-01
store: boltdb-shipper
object_store: s3
schema: v11
index:
prefix: index_
period: 24h
storage_config:
boltdb_shipper:
shared_store: s3
active_index_directory: /tmp/data/loki/index
cache_location: /tmp/data/loki/boltdb-cache
cache_ttl: 72h
ingester:
wal:
enabled: true
dir: /tmp/data/loki/wal
lifecycler:
final_sleep: 0s
chunk_idle_period: 1h # Any chunk not receiving new logs in this time will be flushed
max_chunk_age: 1h # All chunks will be flushed when they hit this age, default is 1h
chunk_target_size: 1048576 # Loki will attempt to build chunks up to 1.5MB, flushing first if chunk_idle_period or max_chunk_age is reached first
chunk_retain_period: 30s # Must be greater than index read cache TTL if using an index cache (Default index read cache TTL is 5m)
max_transfer_retries: 0 # Chunk transfers disabled
ingester_client:
grpc_client_config:
max_send_msg_size: 33554432 # 32mb, default = 16777216]
max_recv_msg_size: 33554432
frontend:
max_outstanding_per_tenant: 2048 # default = 100]
compress_responses: true # default = false]
log_queries_longer_than: 20s
frontend_worker:
grpc_client_config:
max_send_msg_size: 33554432 # 32MiB, default = 16777216]
max_recv_msg_size: 33554432
querier:
max_concurrent: 20
query_scheduler:
max_outstanding_requests_per_tenant: 2048
grpc_client_config:
max_send_msg_size: 33554432 # 32mb, default = 16777216]
max_recv_msg_size: 33554432
query_range:
split_queries_by_interval: 0 # 720h # 30d
compactor:
working_directory: /tmp/data/loki/boltdb-shipper-compactor
shared_store: s3
compaction_interval: 10m
retention_enabled: true
retention_delete_delay: 2h
retention_delete_worker_count: 150
limits_config:
retention_period: 91d
enforce_metric_name: false
reject_old_samples: true
reject_old_samples_max_age: 168h
ingestion_rate_mb: 8 # <float> | default = 4]
ingestion_burst_size_mb: 16 # <int> | default = 6]
per_stream_rate_limit: 5MB # <string|int> | default = "3MB"
per_stream_rate_limit_burst: 15MB # <string|int> | default = "15MB"
max_query_length: 2165h # <duration> | default = 721h
max_query_lookback: 90d
ruler:
storage:
type: local
local:
directory: /tmp/data/loki/rules # volume, directory to scan for rules
rule_path: /tmp/data/loki/rules-temp # volume, store temporary rule files
alertmanager_url: http://alertmanager:9093
enable_alertmanager_v2: true
#alertmanager_client:
# basic_auth_username: "{{ loki_alertmanager_username }}"
# basic_auth_password: "{{ loki_alertmanager_password }}"
analytics:
reporting_enabled: false
```
Also I followed indications from the documentation, but also this config throws these errors:
https://github.com/grafana/loki/issues/7649
https://raw.githubusercontent.com/grafana/loki/v2.7.0/production/docker-compose.yaml
**Expected behavior**
A clear and concise description of what you expected to happen.
I expect a configuration which doesn´t throw this massive amount of logs, as all these errors indicate Loki is not working properly.
**Environment:**
- Infrastructure: docker
- Deployment tool: docker-compose
**Screenshots, Promtail config, or terminal output**
If applicable, add any output to help explain your problem.
```
| | 2023-01-28 20:36:23 | level=error ts=2023-01-28T19:36:23.572333721Z caller=retry.go:73 org_id=fake msg="error processing request" try=0 err="context canceled" |
-- | -- | -- | -- | --
| | 2023-01-28 20:36:23 | level=error ts=2023-01-28T19:36:23.572310831Z caller=retry.go:73 org_id=fake msg="error processing request" try=0 err="context canceled" |
| | 2023-01-28 20:36:23 | level=error ts=2023-01-28T19:36:23.572268611Z caller=retry.go:73 org_id=fake msg="error processing request" try=0 err="context canceled" |
| | 2023-01-28 20:36:23 | level=error ts=2023-01-28T19:36:23.572244731Z caller=retry.go:73 org_id=fake msg="error processing request" try=0 err="context canceled" |
| | 2023-01-28 20:36:23 | level=error ts=2023-01-28T19:36:23.572228431Z caller=retry.go:73 org_id=fake msg="error processing request" try=0 err="context canceled" |
| | 2023-01-28 20:36:23 | level=error ts=2023-01-28T19:36:23.572227461Z caller=retry.go:73 org_id=fake msg="error processing request" try=0 err="context canceled" |
| | 2023-01-28 20:36:23 | level=error ts=2023-01-28T19:36:23.572211611Z caller=retry.go:73 org_id=fake msg="error processing request" try=0 err="context canceled" |
| | 2023-01-28 20:36:23 | level=error ts=2023-01-28T19:36:23.572209881Z caller=retry.go:73 org_id=fake msg="error processing request" try=0 err="context canceled" |
| | 2023-01-28 20:36:23 | level=error ts=2023-01-28T19:36:23.572194541Z caller=retry.go:73 org_id=fake msg="error processing request" try=0 err="context canceled" |
| | 2023-01-28 20:36:23 | level=error ts=2023-01-28T19:36:23.572191951Z caller=retry.go:73 org_id=fake msg="error processing request" try=0 err="context canceled" |
| | 2023-01-28 20:36:23 | level=error ts=2023-01-28T19:36:23.572174761Z caller=retry.go:73 org_id=fake msg="error processing request" try=0 err="context canceled" |
| | 2023-01-28 20:36:23 | level=error ts=2023-01-28T19:36:23.572175601Z caller=retry.go:73 org_id=fake msg="error processing request" try=0 err="context canceled" |
| | 2023-01-28 20:36:23 | level=error ts=2023-01-28T19:36:23.572157611Z caller=retry.go:73 org_id=fake msg="error processing request" try=0 err="context canceled" |
| | 2023-01-28 20:36:23 | level=error ts=2023-01-28T19:36:23.572156511Z caller=retry.go:73 org_id=fake msg="error processing request" try=0 err="context canceled" |
| | 2023-01-28 20:36:23 | level=error ts=2023-01-28T19:36:23.572138501Z caller=retry.go:73 org_id=fake msg="error processing request" try=0 err="context canceled" |
| | 2023-01-28 20:36:23 | level=error ts=2023-01-28T19:36:23.572138841Z caller=retry.go:73 org_id=fake msg="error processing request" try=0 err="context canceled" |
| | 2023-01-28 20:36:23 | level=error ts=2023-01-28T19:36:23.57212015Z caller=retry.go:73 org_id=fake msg="error processing request" try=0 err="context canceled" |
| | 2023-01-28 20:36:23 | level=error ts=2023-01-28T19:36:23.57212016Z caller=retry.go:73 org_id=fake msg="error processing request" try=0 err="context canceled" |
| | 2023-01-28 20:36:23 | level=error ts=2023-01-28T19:36:23.57209931Z caller=retry.go:73 org_id=fake msg="error processing request" try=0 err="context canceled" |
| | 2023-01-28 20:36:23 | level=error ts=2023-01-28T19:36:23.57210204Z caller=retry.go:73 org_id=fake msg="error processing request" try=0 err="context canceled" |
| | 2023-01-28 20:36:23 | level=error ts=2023-01-28T19:36:23.5720801Z caller=retry.go:73 org_id=fake msg="error processing request" try=0 err="context canceled" |
| | 2023-01-28 20:36:23 | level=error ts=2023-01-28T19:36:23.5720826Z caller=retry.go:73 org_id=fake msg="error processing request" try=0 err="context canceled" |
| | 2023-01-28 20:36:23 | level=error ts=2023-01-28T19:36:23.57207587Z caller=retry.go:73 org_id=fake msg="error processing request" try=0 err="context canceled" |
| | 2023-01-28 20:36:23 | level=error ts=2023-01-28T19:36:23.57206376Z caller=retry.go:73 org_id=fake msg="error processing request" try=0 err="context canceled" |
| | 2023-01-28 20:36:23 | level=error ts=2023-01-28T19:36:23.57205934Z caller=retry.go:73 org_id=fake msg="error processing request" try=0 err="context canceled" |
| | 2023-01-28 20:36:23 | level=error ts=2023-01-28T19:36:23.57205271Z caller=retry.go:73 org_id=fake msg="error processing request" try=0 err="context canceled" |
| | 2023-01-28 20:36:23 | level=error ts=2023-01-28T19:36:23.57204479Z caller=retry.go:73 org_id=fake msg="error processing request" try=0 err="context canceled" |
| | 2023-01-28 20:36:23 | level=error ts=2023-01-28T19:36:23.57204007Z caller=retry.go:73 org_id=fake msg="error processing request" try=0 err="context canceled" |
| | 2023-01-28 20:36:23 | level=error ts=2023-01-28T19:36:23.57201794Z caller=retry.go:73 org_id=fake msg="error processing request" try=0 err="context canceled" |
| | 2023-01-28 20:36:23 | level=error ts=2023-01-28T19:36:23.57200454Z caller=retry.go:73 org_id=fake msg="error processing request" try=0 err="context canceled" |
| | 2023-01-28 20:36:23 | level=error ts=2023-01-28T19:36:23.57199991Z caller=retry.go:73 org_id=fake msg="error processing request" try=0 err="context canceled" |
| | 2023-01-28 20:36:23 | level=error ts=2023-01-28T19:36:23.57198614Z caller=retry.go:73 org_id=fake msg="error processing request" try=0 err="context canceled" |
| | 2023-01-28 20:36:23 | level=error ts=2023-01-28T19:36:23.57198084Z caller=retry.go:73 org_id=fake msg="error processing request" try=0 err="context canceled" |
| | 2023-01-28 20:36:23 | level=error ts=2023-01-28T19:36:23.57197007Z caller=retry.go:73 org_id=fake msg="error processing request" try=0 err="context canceled" |
| | 2023-01-28 20:36:23 | level=error ts=2023-01-28T19:36:23.57196704Z caller=retry.go:73 org_id=fake msg="error processing request" try=0 err="context canceled" |
| | 2023-01-28 20:36:23 | level=error ts=2023-01-28T19:36:23.57196314Z caller=retry.go:73 org_id=fake msg="error processing request" try=0 err="context canceled" |
| | 2023-01-28 20:36:23 | level=error ts=2023-01-28T19:36:23.57196226Z caller=retry.go:73 org_id=fake msg="error processing request" try=0 err="context canceled" |
| | 2023-01-28 20:36:23 | level=error ts=2023-01-28T19:36:23.57194973Z caller=retry.go:73 org_id=fake msg="error processing request" try=0 err="context canceled" |
| | 2023-01-28 20:36:23 | level=error ts=2023-01-28T19:36:23.57194703Z caller=retry.go:73 org_id=fake msg="error processing request" try=0 err="context canceled" |
| | 2023-01-28 20:36:23 | level=error ts=2023-01-28T19:36:23.57194161Z caller=retry.go:73 org_id=fake msg="error processing request" try=0 err="context canceled" |
| | 2023-01-28 20:36:23 | level=error ts=2023-01-28T19:36:23.57194288Z caller=retry.go:73 org_id=fake msg="error processing request" try=0 err="context canceled" |
| | 2023-01-28 20:36:23 | level=error ts=2023-01-28T19:36:23.57193037Z caller=retry.go:73 org_id=fake msg="error processing request" try=0 err="context canceled" |
| | 2023-01-28 20:36:23 | level=error ts=2023-01-28T19:36:23.57192737Z caller=retry.go:73 org_id=fake msg="error processing request" try=0 err="context canceled" |
| | 2023-01-28 20:36:23 | level=error ts=2023-01-28T19:36:23.57192479Z caller=retry.go:73 org_id=fake msg="error processing request" try=0 err="context canceled" |
| | 2023-01-28 20:36:23 | level=error ts=2023-01-28T19:36:23.57192525Z caller=retry.go:73 org_id=fake msg="error processing request" try=0 err="context canceled"
``
My Loki logs are producing lots and lots of “context canceled” errors which correlates to my running queries (through Grafana) quite nicely.
Also note that the queries are executed without apparent problems and return logs alright.
However, I don’t really understand the error messages (pretty much a noob here).
My setup so far is mostly for testing purposes, so it is local (using the 2.7.1 Docker image) and not scaled whatsoever.
There seems to be a related issue which was reported here:
H…
Regards,
Kinnison