Loki performance improvement in simple scalable mode

I’m setting up a self-hosted Loki deployment on AWS EC2 (m4.xlarge) using the simple scalable deployment mode, with AWS S3 as the object store. Here’s what my setup looks like:

  • 6 read pods
  • 3 write pods
  • 3 backend pods
  • 1 read-cache and 1 write-cache pod (using Memcached)
  • CPU usage is under 10%, and I have around 8 GiB of free RAM.

Despite this, query performance is very poor. Even a basic query over the last 30 minutes (~2.1 GB of data) gets timeout and takes 2–3 tries to complete, which feels too slow. In many cases, queries are timing out, and I haven’t found any helpful errors in the logs.I suspect the issue might be related to parallelization settings, or chunk-related configs (like chunk size or age for flushing), but I’m having a hard time figuring out an ideal configuration.My goal is to fully utilize the available AWS resources and bring query times down to a few seconds for small queries, and ideally no more than ~30 seconds for large queries over tens of GBs.Would really appreciate any insights, tuning tips, or configuration advice from anyone who’s had success optimizing Loki performance in a similar setup.

My current loki configuration.

server:
  http_listen_port: 3100
  grpc_listen_port: 9095

memberlist:
  join_members:
    - loki-backend:7946 
  bind_port: 7946

common:
  replication_factor: 3
  compactor_address: http://loki-backend:3100
  path_prefix: /var/loki
  storage:
    s3:
      bucketnames: stage-loki-chunks
      region: ap-south-1
  ring:
    kvstore:
      store: memberlist

compactor:
  working_directory: /var/loki/retention
  compaction_interval: 10m
  retention_enabled: false  # Disabled retention deletion

ingester:
  chunk_idle_period: 1h
  wal:
    enabled: true
    dir: /var/loki/wal
  max_chunk_age: 1h
  chunk_retain_period: 3h
  chunk_encoding: snappy
  chunk_target_size: 5242880
  chunk_block_size: 262144

limits_config:
  allow_structured_metadata: true
  ingestion_rate_mb: 20
  ingestion_burst_size_mb: 40
  split_queries_by_interval: 15m
  max_query_parallelism: 32
  max_query_series: 10000
  query_timeout: 5m
  tsdb_max_query_parallelism: 32

# Write path caching (for chunks)
chunk_store_config:
  chunk_cache_config:
    memcached:
      batch_size: 64
      parallelism: 8
    memcached_client:
      addresses: write-cache:11211
      max_idle_conns: 16
      timeout: 200ms

# Read path caching (for query results)
query_range:
  align_queries_with_step: true
  cache_results: true
  results_cache:
    cache:
      default_validity: 24h
      memcached:
        expiration: 24h
        batch_size: 64
        parallelism: 32
      memcached_client:
        addresses: read-cache:11211
        max_idle_conns: 32
        timeout: 200ms

pattern_ingester:
  enabled: true

querier:
  max_concurrent: 20

frontend:
  log_queries_longer_than: 5s
  compress_responses: true

ruler:
  storage:
    type: s3
    s3:
      bucketnames: stage-loki-ruler
      region: ap-south-1
      s3forcepathstyle: false
schema_config:
  configs:
    - from: "2024-04-01"
      store: tsdb
      object_store: s3
      schema: v13
      index:
        prefix: loki_index_
        period: 24h

storage_config:
  aws:
    s3forcepathstyle: false
    s3: https://s3.region-name.amazonaws.com
  tsdb_shipper:
    query_ready_num_days: 1
    active_index_directory: /var/loki/tsdb-index
    cache_location: /var/loki/tsdb-cache
    cache_ttl: 24h

Loki’s query performance largely comes from distribution, so you’ll want to make sure to properly configure query frontend (see Query frontend example | Grafana Loki documentation), and enabled query splitting.

If you hit performance issues with Loki, then take a look at VictoriaLogs. It supports the same log streams concept as Loki does, but it is much more efficient for querying. It is also much easier to configure and operate than Loki. See this article for technical details.