Queries Process to many Bytes

I used Logcli to run a 1 hour metric query, Logcli --stats output stats the query proccessed 5.0GB of uncompressed bytes.

Store.DecompressedBytes: 5.0GB

This doesn’t make sense, my cluster ingests as a whole around 300KB per second, so even if the query pulls every single chunk ingested in the 1 hour time range i’m querying the query would only need to process 1.08GB not to mention i’m filtering the logs by the filename label.

Creating Grafana panels of Loki metric queries is unusable without caching, each query takes forever, I think a good place to start tuning this is understanding why each query processes so much data.

I’m using boltdb-shipper as the index store and s3 as the chunk store. I thought that there is some problem with how index files are saved, that they might be pointing to larger than desired sets of data so I added a compactor instance in an attempt to prevent duplicate chunks from being loaded by queries. This indeed helped and the same query reported to process 3.6GB of uncompressed bytes.

Store.DecompressedBytes: 3.6GB

Any idea on how exactly chunks are loaded by queriers? Or general thoughts on how to decrease queries time.

We are running Loki on vm’s, 2 query frontends, 3 queriers (with 12 cpu’s each), 4 ingesters, 2 distributors, 1 compactor.

auth_enabled: false

server:
  graceful_shutdown_timeout: 5s
  grpc_server_max_concurrent_streams: 1000
  grpc_server_max_recv_msg_size: 104857600
  grpc_server_max_send_msg_size: 104857600
  http_listen_port: 3100,
  http_server_idle_timeout: 120s,
  http_server_write_timeout: 1m 

distributor:
  ring:
    kvstore:
      store: memberlist

ingester:
  chunk_encoding: snappy
  chunk_block_size: 262144
  chunk_target_size: 4000000
  chunk_idle_period: 15m
  lifecycler:
    heartbeat_period: 5s
    join_after: 30s
    num_tokens: 512
    ring:
      heartbeat_timeout: 1m
      kvstore:
        store: memberlist
      replication_factor: 3
    final_sleep: 0s
  max_transfer_retries: 60

ingester_client:
  grpc_client_config:
    max_recv_msg_size: 67108864
  remote_timeout: 1s

frontend:
  compress_responses: true
  log_queries_longer_than: 5s
  max_outstanding_per_tenant: 1024

frontend_worker:
  frontend_address: frontend:9096
  grpc_client_config:
    max_send_msg_size: 104857600
  parallelism: 12

limits_config:
  enforce_metric_name: false
  ingestion_burst_size_mb: 10
  ingestion_rate_mb: 5
  ingestion_rate_strategy: local
  max_cache_freshness_per_query: 10m
  max_global_streams_per_user: 10000
  max_query_length: 12000h
  max_query_parallelism: 256
  max_streams_per_user: 0
  reject_old_samples: true
  reject_old_samples_max_age: 168h

querier:
  query_ingesters_within: 2h

query_range:
  align_queries_with_step: true
  max_retries: 5 
  parallelise_shardable_queries: true
  split_queries_by_interval: 30m

schema_config:
  configs:
    - from: 2020-06-10
      store: boltdb-shipper
      object_store: s3
      schema: v11
      index:
        prefix: test_
        period: 24h

compactor:
  working_directory: /opt/loki/compactor
  shared_store: s3

storage_config:
  aws:
    bucketnames: bucket_names
    endpoint: endpoint
    region: region
    access_key_id: mysecret_key_id
    secret_access_key: mysecret_access_key
    http_config:
      idle_conn_timeout: 90s
      response_header_timeout: 0s
      insecure_skip_verify: true
    s3forcepathstyle: true
  boltdb-shipper:
    active_index_directory /opt/loki/boltdb-shipper-active
    cache_location /opt/loki/boltdb-shipper-cache
    shared_store: s3
 
memberlist:
   abort_if_cluster_join_fails: false
   bind_addr:
     - the_bind_ip_address
   bind_port: 7946
   join_members:
     - ip_address:7946
     - off_all_the_loki_components:7946