Hello,
I have Loki 3.1 running in monolithic mode on a single instance via docker. The storage backend is AWS S3. When I go to Grafana’s Explorer page for querying Loki with the time period set to the last 5 minutes and use the drop down to see the available labels, the loading of the labels just spins until it times out (like a minute) and then the Loki process dies. This only occurs when there have been no logs within the past 5 minutes, even though there were logs a few hours before. If there are logs within the past 5 minutes, the labels drop down populates as expected.
I would expect that the labels would either come back empty immediately or it would be smart enough to know what labels are present from the logs that happened hours before. What is the expected Loki behavior here?
A similar issue occurs when actually performing the query. If I enter the query directly via code, and have the time period set to last 5 minutes, when there are no logs in those past 5 minutes, rather than the query immediately coming back empty, it spins until it times out and process dies shortly after. Similarly, if I set the time period to last 30 days, it spins until it times out even though there are logs over that period.
Another observation is if I have recent logs, I am able to search them. I can progressively increase the time interval up to a point - say 7 days - when it will then time out. With other attempts, I was able to get it to search all the way back to 30 days without timing out. There are not that many unique log entries to return.
The Loki configuration:
server:
http_listen_port: 3100
grpc_listen_port: 9096
log_level: warn
grpc_server_max_concurrent_streams: 1000
common:
path_prefix: /tmp/loki
storage:
s3:
bucketnames: my-bucket-name
region: us-west-2
endpoint: s3.us-west-2.amazonaws.com
sse:
type: SSE-S3
replication_factor: 1
ring:
instance_addr: 127.0.0.1
kvstore:
store: inmemory
limits_config:
retention_period: 2160h # 90 days
compactor:
working_directory: /loki/retention
compaction_interval: 10m
retention_enabled: true
retention_delete_delay: 2h
retention_delete_worker_count: 150
delete_request_store: aws
storage_config:
tsdb_shipper:
active_index_directory: /loki/index
cache_location: /loki/index_cache
aws:
bucketnames: my-bucket-name
endpoint: s3.us-west-2.amazonaws.com
region: us-west-2
sse:
type: SSE-S3
schema_config:
configs:
- from: 2020-01-01
store: tsdb
object_store: aws
schema: v13
index:
prefix: index_
period: 24h
analytics:
reporting_enabled: false
The docker command to run Loki:
sudo docker run --detach --name loki --restart unless-stopped --network observability --mount type=bind,src=/etc/loki/loki.yaml,dst=/etc/loki/loki.yaml,readonly -p 3100:3100 --log-driver local grafana/loki:3.1.0 -config.file=/etc/loki/loki.yaml -log-config-reverse-order
AWS S3 Policy
{
"Version": "2012-10-17",
"Statement": [
{
"Sid": "LokiStorage",
"Effect": "Allow",
"Action": [
"s3:ListBucket",
"s3:PutObject",
"s3:GetObject",
"s3:DeleteObject"
],
"Resource": [
"arn:aws:s3:::my-bucket-name",
"arn:aws:s3:::my-bucket-name/*"
]
}
]
}
EC2 instance has a IAM profile with a role with this policy.
Loki logs when the label query finally times out (after which the process dies):
RequestCanceled: request context canceled
caused by: context deadline exceeded
error initialising module: store
github.com/grafana/dskit/modules.(*Manager).initModule
/src/loki/vendor/github.com/grafana/dskit/modules/modules.go:138
github.com/grafana/dskit/modules.(*Manager).InitModuleServices
/src/loki/vendor/github.com/grafana/dskit/modules/modules.go:108
github.com/grafana/loki/v3/pkg/loki.(*Loki).Run
/src/loki/pkg/loki/loki.go:458
main.main
/src/loki/cmd/loki/main.go:129
runtime.main
/usr/local/go/src/runtime/proc.go:271
runtime.goexit
/usr/local/go/src/runtime/asm_amd64.s:1695
level=error ts=2024-08-16T20:28:16.957048386Z caller=index_set.go:306 table-name=index_19951 msg="sync failed, retrying it" err="failed to get s3 object: RequestCanceled: request context canceled\ncaused by: context deadline exceeded"
level=error ts=2024-08-16T20:28:16.957266279Z caller=cached_client.go:275 msg="failed to build table cache" table_name=index_19951 err="RequestCanceled: request context canceled\ncaused by: context deadline exceeded"
level=error ts=2024-08-16T20:28:16.957281769Z caller=index_set.go:306 table-name=index_19951 msg="sync failed, retrying it" err="RequestCanceled: request context canceled\ncaused by: context deadline exceeded"
ts=2024-08-16T20:28:16.957290159Z caller=spanlogger.go:109 table-name=index_19951 method=indexSet.Init level=error msg="failed to initialize table, cleaning it up" table=index_19951 err="RequestCanceled: request context canceled\ncaused by: context deadline exceeded"
level=error ts=2024-08-16T20:28:16.957709875Z caller=log.go:216 msg="error running loki" err="RequestCanceled: request context canceled\ncaused by: context deadline exceeded\nerror initialising module: store\ngithub.com/grafana/dskit/modules.(*Manager).initModule\n\t/src/loki/vendor/github.com/grafana/dskit/modules/modules.go:138\ngithub.com/grafana/dskit/modules.(*Manager).InitModuleServices\n\t/src/loki/vendor/github.com/grafana/dskit/modules/modules.go:108\ngithub.com/grafana/loki/v3/pkg/loki.(*Loki).Run\n\t/src/loki/pkg/loki/loki.go:458\nmain.main\n\t/src/loki/cmd/loki/main.go:129\nruntime.main\n\t/usr/local/go/src/runtime/proc.go:271\nruntime.goexit\n\t/usr/local/go/src/runtime/asm_amd64.s:1695"
If I bypass Grafana and do the labels query from lokicli instead:
$ ./logcli-linux-amd64 labels
2024/08/16 20:42:43 http://localhost:3100/loki/api/v1/labels?end=1723840963266126671&start=1723837363266126671
2024/08/16 20:43:38 error sending request Get "http://localhost:3100/loki/api/v1/labels?end=1723840963266126671&start=1723837363266126671": EOF
2024/08/16 20:43:38 Error doing request: run out of attempts while querying the server
And then Loki dies.
Is there something wrong with the configuration?
Any help would be greatly appreciated! Thank you!
EDIT: Setup works correctly if I point Loki to the example filesystem based config instead of trying to go to S3.