Count_over_time query causes OOM on loki-read pod with 8GB limit

Hi everyone,

I’m running into a recurring issue with Loki (v3.3.0) in a multi-tenant setup. We store over 600GB of logs across multiple tenants spanning 6 months of retention.

When running the following query on one of the tenants:

count_over_time({env="prod-1"}[15m])

…it crashes the loki-read pod due to OOM , even though we have 8GB memory limit set for that container.

This happens fairly consistently under queries involving even short time windows (e.g. 15 minutes), particularly when using count_over_time.


Our setup:

  • Deployed using Helm with simple-scaled mode
  • 3x loki-read, 3x loki-write, 3x loki-backend (chunk cache)
  • Chunk storage : TSDB, backed by Azure Blob Storage
  • Index retention : 24h
  • Total log volume: ~600GB over 6 months

My questions:

  1. What memory-related settings should I tune to allow this type of query to complete successfully?
  2. Is there any way to limit the amount of data count_over_time pulls into memory?
  3. Would running the query against a narrower log stream label set help, or does count_over_time still aggregate too much regardless?

Any best practices for query optimization or memory tuning in long-term, multi-tenant setups would be highly appreciated.

Thanks in advance!

  1. In general you want more queries with smaller individual footprint. For example, if you have 3 reads with 8GB memory, it would be better to have 6 with 4GB memory.
  2. Make sure you have query splitting configured and it’s working properly.
  3. What’s the duration of your query? 1 day? 2 days? or more?