Loki is very slow with tsdb compared to boltdb-shipper in single binary mode

lko23 · May 16, 2024, 8:23pm

I have a single binary mode loki server running with about 5kEps of write path stream. Querying all events (with {foo=~“.+”}) for a one hour window takes 10 sec with boltdb-shipper as store and with tsdb only a 5 minute window of all events can be queryied within the 30 sec timeout of grafana. This was both tested with loki 2.9.x and loki 3.0.0. Do I miss any tsdb specific configs that are maybe single binary mode specific?

/etc/loki/config.yml

auth_enabled: false

server:
  http_listen_port: 3101
  grpc_listen_port: 9097
  grpc_server_max_recv_msg_size: 20971520
  grpc_server_max_send_msg_size: 20971520
ingester:
  wal:
    enabled: true
    dir: /opt/loki/wal
  chunk_encoding: snappy


common:
  instance_addr: 127.0.0.1
  path_prefix: /opt/loki/
  storage:
    filesystem:
      chunks_directory: /opt/loki/chunks
      rules_directory: /opt/loki/rules
  replication_factor: 1
  ring:
    kvstore:
      store: inmemory

query_scheduler:
  max_outstanding_requests_per_tenant: 32000

limits_config:
  ingestion_burst_size_mb: 4096
  ingestion_rate_mb: 2048
  per_stream_rate_limit: 1024M
  per_stream_rate_limit_burst: 2048M
  retention_period: 7d
  max_global_streams_per_user: 0 
  allow_structured_metadata: "false / true"

schema_config:
  configs:
    - from: 2024-05-01
      store: "boltdb-shipper / tsdb"
      object_store: filesystem
      schema: v13
      index:
        prefix: index_
        period: 24h

compactor:
  working_directory: /opt/loki/compactor 
  compaction_interval: 10m
  retention_enabled: true 
  retention_delete_delay: 15m
  retention_delete_worker_count: 150
  delete_request_store: filesystem

table_manager:
  retention_deletes_enabled: true
  retention_period: 7d

tonyswumac · May 22, 2024, 12:26am

I don’t think there is any index configuration specific to single instance.

What does your query splitting look like? And what’s your query_ingesters_within set to?

lko23 · May 22, 2024, 8:24am

@tonyswumac: split_queries_by_interval is set to 1h. query_ingesters_within is set 1 h greater than max_chunk_age.

I found two problems with my setup:

the system was limited by the amount of memory available. Now with more memory tsdb and boltdb perform about the same.
when comparing 2.9.x with 3.0.0 I found that there is a new default value for querier max_concurrent (old: 10, new: 4) that on single instance relates to the maximum number of cpu cores usable for the loki server process.

Topic		Replies	Views
Performance issues with grafana loki after upgraded to version 3.x Grafana Loki loki	6	123	August 5, 2024
Improving Performance in Loki System for Production Use Grafana Loki loki	3	2857	June 15, 2024
Grafana/Loki slow to load more than 2k logs Grafana Loki loki	1	185	June 7, 2024
Is it possible to increase loki long queries speed? Grafana Loki loki	2	1281	January 26, 2023
Loki Query Performance Grafana Loki loki	5	27	November 14, 2024

Loki is very slow with tsdb compared to boltdb-shipper in single binary mode

Related topics