Speeding up labels and series logql throughput

Hello there!

I recently worked on speeding up our current Loki deployment.

My main change was switching the storage engine from boltdb to tdsb, getting a nice speedup for filter (750MB/sec-1.3GB/sec) and metric (~450MB/sec-3.82GB/sec) queries.

However, labels and series are still very slow, capping at about 50 MB/sec.

Specifically, I’m looking at this query from the Loki 2.0 Global Metrics dashboard (Loki2.0 Global Metrics | Grafana Labs):

histogram_quantile(0.95, sum(rate(loki_logql_querystats_bytes_processed_per_seconds_bucket[5m])) by (le,type))

Last 24h operation:

Excluding series and metric:

The values for labels and series seems to be hard-capped at 47.5 MB/second.

What can i look into in order to speed those metrics up?

To add some context, we’re using loki-distributed (loki version 2.9.6) querying against storage in s3.

Edit: We are using Redis for cache in the following places:

  • index_queries_cache_config
  • query_range.results_cache
  • query_range.index_stat_results_cache
  • chunk_store.chunk_cache_config
  • chunk_store.write_dedupe_cache_config

I suspect one of these caches might be actually hurting performance, and I’d be better off switching to a smaller (512MB or 1GB) embedded cache to avoid network latency at the cost of memory usage.

I’m not necessarily looking for specific things to do, even being pointed at what documentation pages to do or other parameter to check would help a lot.

Thanks in advance!

esantoro