Loki is very slow with tsdb compared to boltdb-shipper in single binary mode

I have a single binary mode loki server running with about 5kEps of write path stream. Querying all events (with {foo=~“.+”}) for a one hour window takes 10 sec with boltdb-shipper as store and with tsdb only a 5 minute window of all events can be queryied within the 30 sec timeout of grafana. This was both tested with loki 2.9.x and loki 3.0.0. Do I miss any tsdb specific configs that are maybe single binary mode specific?

/etc/loki/config.yml

auth_enabled: false

server:
  http_listen_port: 3101
  grpc_listen_port: 9097
  grpc_server_max_recv_msg_size: 20971520
  grpc_server_max_send_msg_size: 20971520
ingester:
  wal:
    enabled: true
    dir: /opt/loki/wal
  chunk_encoding: snappy


common:
  instance_addr: 127.0.0.1
  path_prefix: /opt/loki/
  storage:
    filesystem:
      chunks_directory: /opt/loki/chunks
      rules_directory: /opt/loki/rules
  replication_factor: 1
  ring:
    kvstore:
      store: inmemory

query_scheduler:
  max_outstanding_requests_per_tenant: 32000

limits_config:
  ingestion_burst_size_mb: 4096
  ingestion_rate_mb: 2048
  per_stream_rate_limit: 1024M
  per_stream_rate_limit_burst: 2048M
  retention_period: 7d
  max_global_streams_per_user: 0 
  allow_structured_metadata: "false / true"

schema_config:
  configs:
    - from: 2024-05-01
      store: "boltdb-shipper / tsdb"
      object_store: filesystem
      schema: v13
      index:
        prefix: index_
        period: 24h

compactor:
  working_directory: /opt/loki/compactor 
  compaction_interval: 10m
  retention_enabled: true 
  retention_delete_delay: 15m
  retention_delete_worker_count: 150
  delete_request_store: filesystem

table_manager:
  retention_deletes_enabled: true
  retention_period: 7d

I don’t think there is any index configuration specific to single instance.

What does your query splitting look like? And what’s your query_ingesters_within set to?

@tonyswumac: split_queries_by_interval is set to 1h. query_ingesters_within is set 1 h greater than max_chunk_age.

I found two problems with my setup:

  1. the system was limited by the amount of memory available. Now with more memory tsdb and boltdb perform about the same.
  2. when comparing 2.9.x with 3.0.0 I found that there is a new default value for querier max_concurrent (old: 10, new: 4) that on single instance relates to the maximum number of cpu cores usable for the loki server process.

This topic was automatically closed 365 days after the last reply. New replies are no longer allowed.