Hello Everyone
We are experiencing extremely long query durations and timeouts from loki:
We’re hosting Grafana on a local machine. We are using Alloy to tail log files and forward logs to loki. There are a couple high volume log files (80KB/s), which are truncated once they reach 200MB. Alloy has an issue recognizing these truncations, so we have a scheduled task that restarts it every 30 minutes. If this isn’t done, most of the logs from these high volume log files, which make up about 80-90% of our total log volume, are not ingested.
The performance problems started a couple days after we introduced this scheduled task. After I discovered the performance problems I disabled the scheduled task and restarted loki, which solved the issue.
I am now trying to find out what caused this to happen, as we used this scheduled task workaround in the past on a less powerful machine and with a similar log volume and did not encounter issues this severe. I can definitively say that it is not a resource issue, as memory or CPU usage on the machine our Grafana stack runs on never exceeded 50% or 15% respectively.
I can provide further details and more metrics about alloy / loki to troubleshoot the issue.
Your problem is most likely not related to whatever you are doing with Alloy.
Please share your Loki configuration.
Thanks for replying!
That makes sense to me, yeah. Here’s the loki config.yaml:
auth_enabled: false
server:
http_listen_port: 3100
grpc_listen_port: 9096
log_level: debug
grpc_server_max_concurrent_streams: 1000
common:
instance_addr: 127.0.0.1
path_prefix: D:\loki
storage:
filesystem:
chunks_directory: D:\loki\chunks
rules_directory: D:\loki\rules
replication_factor: 1
ring:
kvstore:
store: inmemory
query_range:
results_cache:
cache:
embedded_cache:
enabled: true
max_size_mb: 100
limits_config:
metric_aggregation_enabled: true
enable_multi_variant_queries: true
allow_structured_metadata: true
discover_log_levels: true
retention_period: 336h
volume_enabled: true
compactor:
retention_enabled: true
delete_request_store: filesystem
schema_config:
configs:
- from: 2025-09-01
store: tsdb
object_store: filesystem
schema: v13
index:
prefix: index_
period: 24h
storage_config:
filesystem:
directory: D:\loki
pattern_ingester:
enabled: true
metric_aggregation:
loki_address: localhost:3100
ruler:
alertmanager_url: http://localhost:9093
frontend:
encoding: protobuf
analytics:
reporting_enabled: false
Seems to be pretty standard configuration.
When you say you implemented a process to restart alloy when logs are truncated, do you notice any duplicated ingestion perhaps? Is your alloy process keeping state correctly so as not to send duplicated logs to Loki? If you are keeping metrics on Loki I’d check ingestion rate and see if there is any difference there.