Loki Performance Problems

Hello Everyone

We are experiencing extremely long query durations and timeouts from loki:

We’re hosting Grafana on a local machine. We are using Alloy to tail log files and forward logs to loki. There are a couple high volume log files (80KB/s), which are truncated once they reach 200MB. Alloy has an issue recognizing these truncations, so we have a scheduled task that restarts it every 30 minutes. If this isn’t done, most of the logs from these high volume log files, which make up about 80-90% of our total log volume, are not ingested.

The performance problems started a couple days after we introduced this scheduled task. After I discovered the performance problems I disabled the scheduled task and restarted loki, which solved the issue.

I am now trying to find out what caused this to happen, as we used this scheduled task workaround in the past on a less powerful machine and with a similar log volume and did not encounter issues this severe. I can definitively say that it is not a resource issue, as memory or CPU usage on the machine our Grafana stack runs on never exceeded 50% or 15% respectively.

I can provide further details and more metrics about alloy / loki to troubleshoot the issue.

Your problem is most likely not related to whatever you are doing with Alloy.

Please share your Loki configuration.

Thanks for replying!

That makes sense to me, yeah. Here’s the loki config.yaml:

auth_enabled: false

server:
  http_listen_port: 3100
  grpc_listen_port: 9096
  log_level: debug
  grpc_server_max_concurrent_streams: 1000

common:
  instance_addr: 127.0.0.1
  path_prefix: D:\loki
  storage:
    filesystem:
      chunks_directory: D:\loki\chunks
      rules_directory: D:\loki\rules
  replication_factor: 1
  ring:
    kvstore:
      store: inmemory

query_range:
  results_cache:
    cache:
      embedded_cache:
        enabled: true
        max_size_mb: 100

limits_config:
  metric_aggregation_enabled: true
  enable_multi_variant_queries: true
  allow_structured_metadata: true
  discover_log_levels: true
  retention_period: 336h
  volume_enabled: true
  
compactor:
    retention_enabled: true
    delete_request_store: filesystem

schema_config:
  configs:
    - from: 2025-09-01
      store: tsdb
      object_store: filesystem
      schema: v13
      index:
        prefix: index_
        period: 24h
        
storage_config:
    filesystem:
        directory: D:\loki

pattern_ingester:
  enabled: true
  metric_aggregation:
    loki_address: localhost:3100

ruler:
  alertmanager_url: http://localhost:9093

frontend:
  encoding: protobuf

analytics:
  reporting_enabled: false

Seems to be pretty standard configuration.

When you say you implemented a process to restart alloy when logs are truncated, do you notice any duplicated ingestion perhaps? Is your alloy process keeping state correctly so as not to send duplicated logs to Loki? If you are keeping metrics on Loki I’d check ingestion rate and see if there is any difference there.