Time until loki streams are searchable by querier

In my environment (Grafana version 11.1.0 / Loki version 3.1.0) low volume streams take longer to be searchable by querier compared to high volume streams. This is especially problematic for very low volume streams (1 event per day or less) where it can take more than 24 hours until the event is found by the querier. It seems like the ingester is not properly queried by the querier. As I send the same event to two different log-aggregation system, I know that the event was sent without delay. query_ingesters_within is set to 0 (but also tried 96h). Any help is appreciated.

querier:
  query_ingesters_within: 0 
  max_concurrent: 24
  concurrent_flushes: 32
  flush_check_period: 30s
  flush_op_backoff:
    min_period: 10s
    max_period: 1m0s
    max_retries: 10
  flush_op_timeout: 10m0s
  chunk_retain_period: 0s
  chunk_idle_period: 30m0s
  chunk_block_size: 262144
  chunk_target_size: 1572864
  chunk_encoding: snappy
  max_chunk_age: 2h0m0s
  autoforget_unhealthy: false
  sync_period: 1h0m0s
  sync_min_utilization: 0.1
  max_returned_stream_errors: 10
  query_store_max_look_back_period: 2h41m0s

  wal:
    enabled: true

query_scheduler:
  max_outstanding_requests_per_tenant: 32000

limits_config:
  max_global_streams_per_user: 0
  shard_streams:
    enabled: false
  discover_service_name:
  discover_log_levels: false
  reject_old_samples: false
  query_timeout: 30s

schema_config:
  configs:
    - from: 2024-05-01
      store: tsdb
      object_store: filesystem
      schema: v13
      index:
        prefix: index_
        period: 24h
  1. How are you deploying Loki?
  2. Do you have a server section of your Loki configuration, or is that it?

Hi @tonyswumac
Loki is deployed in single binary mode.
Below is the complete loki config.yml

auth_enabled: false

server:
  http_listen_port: 3101
  grpc_listen_port: 9097
  grpc_server_max_recv_msg_size: 20971520
  grpc_server_max_send_msg_size: 20971520 
internal_server:
  http_listen_port: 3111

ingester:
  wal:
    enabled: true
    dir: /opt/loki/loki_default/wal
  chunk_encoding: snappy

common:
  instance_addr: 127.0.0.1
  path_prefix: /opt/loki/loki_default
  storage:
    filesystem:
      chunks_directory: /opt/loki/loki_default/chunks
      rules_directory: /opt/loki/loki_default/rules
  replication_factor: 1
  ring:
    kvstore:
      store: inmemory

querier:
  query_ingesters_within: 0
  max_concurrent: 24

query_scheduler:
  max_outstanding_requests_per_tenant: 32000

limits_config:
  ingestion_burst_size_mb: 4096
  ingestion_rate_mb: 2048
  per_stream_rate_limit: 1024M
  per_stream_rate_limit_burst: 2048M
  retention_period: 90d
  max_global_streams_per_user: 0
  shard_streams:
    enabled: false
  discover_service_name:
  discover_log_levels: false
  reject_old_samples: false
  query_timeout: 30s

schema_config:
  configs:
    - from: 2024-05-01
      store: tsdb
      object_store: filesystem
      schema: v13
      index:
        prefix: index_
        period: 24h

compactor:
  working_directory: /opt/loki/loki_default/compactor
  retention_enabled: true
  delete_request_store: filesystem

table_manager:
  retention_deletes_enabled: true
  retention_period: 90d

Try this:

  1. Change instance_addr to 0.0.0.0.
  2. Add a memberlist configuration, with your Loki instance’s IP. This may not be necessary, I’d try with just #1 first and see if that fixes it.

Thank you @tonyswumac. I followed your advise and changed instance_addr to 0.0.0.0 and restarted the loki service. Unfortunately I do not see too much of a difference between binding the loki server to the loopback interface or to all interfaces.

To your second point, could you please explain me with more details how a memberlist configuration for a single binary deployment should look like? Until now my understanding was that this is only needed for distributed deployments.

Unfortunately I am not able to replicate your issue, using your own configuration. Here is what I have:

docker-compose.yml:

---
networks:
  loki:

services:
  loki:
    image: grafana/loki:3.1.0
    command: "-config.file=/etc/loki/config.yaml -target=all"
    ports:
      - "3101:3101"
      - 7946
      - 9095
    volumes:
      - /root/loki/loki-config.yaml:/etc/loki/config.yaml
      - /root/loki/opt:/opt/loki:rw
    networks:
      loki:
        aliases:
          - loki

  grafana:
    image: grafana/grafana:latest
    environment:
      - GF_PATHS_PROVISIONING=/etc/grafana/provisioning
      - GF_AUTH_ANONYMOUS_ENABLED=true
      - GF_AUTH_ANONYMOUS_ORG_ROLE=Admin
    entrypoint:
      - sh
      - -euc
      - |
        mkdir -p /etc/grafana/provisioning/datasources
        cat <<EOF > /etc/grafana/provisioning/datasources/ds.yaml
        apiVersion: 1
        datasources:
          - name: Loki
            type: loki
            access: proxy
            url: http://loki:3101
        EOF
        /run.sh
    ports:
      - "3000:3000"
    networks:
      - loki

loki-config.yaml:

auth_enabled: false

server:
  http_listen_port: 3101
  grpc_listen_port: 9097
  grpc_server_max_recv_msg_size: 20971520
  grpc_server_max_send_msg_size: 20971520 
internal_server:
  http_listen_port: 3111

ingester:
  wal:
    enabled: true
    dir: /opt/loki/loki_default/wal
  chunk_encoding: snappy

common:
  instance_addr: 0.0.0.0
  path_prefix: /opt/loki/loki_default
  storage:
    filesystem:
      chunks_directory: /opt/loki/loki_default/chunks
      rules_directory: /opt/loki/loki_default/rules
  replication_factor: 1
  ring:
    kvstore:
      store: inmemory

querier:
  query_ingesters_within: 0
  max_concurrent: 24

query_scheduler:
  max_outstanding_requests_per_tenant: 32000

limits_config:
  ingestion_burst_size_mb: 4096
  ingestion_rate_mb: 2048
  per_stream_rate_limit: 1024M
  per_stream_rate_limit_burst: 2048M
  retention_period: 90d
  max_global_streams_per_user: 0
  shard_streams:
    enabled: false
  discover_service_name:
  discover_log_levels: false
  reject_old_samples: false
  query_timeout: 30s

schema_config:
  configs:
    - from: 2024-05-01
      store: tsdb
      object_store: filesystem
      schema: v13
      index:
        prefix: index_
        period: 24h

compactor:
  working_directory: /opt/loki/loki_default/compactor
  retention_enabled: true
  delete_request_store: filesystem

table_manager:
  retention_deletes_enabled: true
  retention_period: 90d

Starting up Loki and a localized Grafana: docker-compose up -d

Screenshot of the 1-line log that’s pretty much immediately query-able:

Hi @tonyswumac, thank you for testing my configurations in your setup. Some differences I see between the two setups:

  • You are using docker. I am using “direct install”
  • I am running two loki server process on the same host (with both external and internal separated ports). But maybe there is some conflict between the two server processes that I do not realize.
  • The two loki server have some load. server process 1 has about 400’000 events/min and server process 2 has about 50’000 events/min. On stream level, the highest volume streams have about 60 events/minute and the lowest volume stream have some few events/day.

Ok, this is information I didn’t know. The way Loki storage works is new logs come in to ingester, it gets written into WAL, and it will not get committed to long term storage for some time (this depends on your chunk age and chunk size settings). During this time, your querier needs to query the ingester directly to get the logs that aren’t commited to long-term storage yet.

If you are running two Loki server, then you need to use memberlist to form a cluster between the two.

@tonyswumac, I have two loki server processes that should run along side and should have no contact to each other at all. So this means if I query server process 1 I should only get results from server 1, and the same for server process 2. That’s why I don’t want to have a memberlist between the two loki servers. But I could still have two separate memberlist each for one server process, but I would not know how to configure it.

I am not sure how else I can be of help, since I am unable to reproduce your problem.

I had assumed that you were running two Loki containers on the same host for a cluster. But if you are not, and they are supposed to be separate entities, then I would recommend you to separate them physically as well, and see if that fixes your problem.

Hi @tonyswumac, thanks again for your help. I finally found the problem I had in my setup. The problem was not in Loki it self but in the preceding log collection pipeline that did not push low volume streams in a timely manner towards loki (in contrast to the second channel going to an other log aggregation system). After upgrading the log collector this is resolved.