Inconsist query response when querying ingesters

When querying our Loki cluster we’re seeing some inconsistency in query results for queries that hit the ingesters (queries for the most recently ingested logs that haven’t yet been written to the chunk store). We have the querier configured to query the ingesters within 2 hours. Any queries inside this time range seem to jump between different subsets of results (i.e. run a query and get one set of logs, run the same query again and get a different set of logs, run it again and get the first set of logs again).

I’m not sure if this is expected behaviour, or a problem with how we have configured Loki. How does the querier work under the hood when querying ingesters? Would it send the query to all ingesters or only a subset? We don’t see the same behaviour for queries covering older time ranges that don’t hit the ingesters.

Our cluster is set up as follows:

We used loki/production/docker at main · grafana/loki · GitHub as a starting point for our cluster setup, currently running (in AWS ECS):

  • 3 Loki servers, all enabled with distributor, ingester, querier
  • Memberlist for service discovery
  • Boltdb-shipper for the index
  • S3 for chunk/index storage
  • A separate instance running a query frontend
  • Redis for index query/chunk/query results caches

Our Loki instances have the following configuration (The ${…} placeholders get populated at deployment time):

auth_enabled: false

http_prefix:

server:
  http_listen_address: 0.0.0.0
  grpc_listen_address: 0.0.0.0
  http_listen_port: ${loki_http_port}
  grpc_listen_port: ${loki_grpc_port}
  log_level: ${loki_log_level}

memberlist:
  join_members:
    - ${loki_sd_dns}
  abort_if_cluster_join_fails: false
  max_join_backoff: 1m
  max_join_retries: 10
  min_join_backoff: 1s
  dead_node_reclaim_time: 30s
  gossip_to_dead_nodes_time: 15s
  left_ingesters_timeout: 30s
  bind_addr: ['0.0.0.0']
  bind_port: ${loki_bind_port}

limits_config:
  ingestion_rate_strategy: local
  ingestion_rate_mb: 10
  ingestion_burst_size_mb: 20
  max_streams_per_user: 0

ingester:
  lifecycler:
    join_after: 60s
    final_sleep: 0s
    ring:
      replication_factor: 3
      heartbeat_timeout: 60s
      kvstore:
        store: memberlist
  chunk_retain_period: 30s
  chunk_idle_period: 15m
  chunk_block_size: 262144
  chunk_target_size: 1536000
  max_transfer_retries: 0
  wal:
    enabled: true
    dir: /loki/wal
    flush_on_shutdown: true
    replay_memory_ceiling: 1GB

distributor:
 ring:
   kvstore:
     store: memberlist

schema_config:
  configs:
  - from: 2021-05-01
    store: boltdb-shipper
    object_store: s3
    schema: v11
    index:
      prefix: loki_index_
      period: 24h

storage_config:
  aws:
    s3: ${loki_s3_bucket}
    sse_encryption: true
    insecure: false
    s3forcepathstyle: true
  boltdb_shipper:
    shared_store: s3
    active_index_directory: /loki/index
    cache_location: /loki/boltdb-cache
  index_cache_validity: 14m
  index_queries_cache_config:
    redis:
      endpoint: ${loki_redis_endpoint}
      timeout: 1s
      db: 1

chunk_store_config:
  max_look_back_period: 8736h
  chunk_cache_config:
    redis:
      endpoint: ${loki_redis_endpoint}
      timeout: 1s
      db: 2

table_manager:
  retention_deletes_enabled: true
  retention_period: 8736h

query_range:
  # make queries more cache-able by aligning them with their step intervals
  align_queries_with_step: true
  max_retries: 5
  # parallelize queries in 15min intervals
  split_queries_by_interval: 15m
  parallelise_shardable_queries: true
  cache_results: true
  results_cache:
    cache:
      redis:
        endpoint: ${loki_redis_endpoint}
        timeout: 1s
        db: 0

frontend:
  log_queries_longer_than: 5s
  compress_responses: true
  tail_proxy_url: ${loki_query_backend_url}

frontend_worker:
  frontend_address: ${loki_query_local_dns}:${loki_grpc_port}
  grpc_client_config:
    max_send_msg_size: 1.048576e+08
  
querier:
  query_ingesters_within: 2h

One problem I suspect could be your chunk_retain_period being less than index_cache_validity while it should be the other way around. With this setting, you are serving stale results from your cache for 14 mins while your ingesters have already dropped backing data, without which the results would be incomplete. We internally set them to 6 mins and 5 mins respectively. You can reduce them to a lower value in the same proportion.

Despite making the above config changes I still see the same issue with search results jumping around on refresh

Actually, forget that. I think that was due to two different sets of ingesters being present in the ring during a deployment. Once the older set dropped out we no longer see the issue