Netflow inbalance over ingesters

gogodjzhu · August 15, 2023, 9:31am

Fresh to loki, thanks ahead.

Deployments

I deploy a loki cluster by seperating the components into different pods:

distributor * 3
ingester * 3
custom-loki-client * 2

Config

common

auth_enabled: false
http_prefix:
common:
  path_prefix: /data/loki
  replication_factor: 1
  ring:
    kvstore:
      store: memberlist

server:
  http_listen_address: 0.0.0.0
  grpc_listen_address: 0.0.0.0
  grpc_server_max_recv_msg_size: 41943040
  grpc_server_max_send_msg_size: 41943040
  http_listen_port: 3100
  grpc_listen_port: 9095
  log_level: info

ingester:
  lifecycler:
    join_after: 10s
    observe_period: 5s
  wal:
    enabled: true
    dir: /data/loki/wal
    flush_on_shutdown: true
  max_chunk_age: 1h
  chunk_retain_period: 30s
  chunk_encoding: snappy
  chunk_target_size: 0
  chunk_block_size: 104857600

memberlist:
  join_members: ["${RING_DOMAIN:-127.0.0.1}"] # same ENV setting over ingester and distributer
  dead_node_reclaim_time: 30s
  gossip_to_dead_nodes_time: 15s
  left_ingesters_timeout: 30s
  bind_addr: ['0.0.0.0']
  bind_port: 7946

schema_config:
  configs:
    - from: 2020-10-24
      store: boltdb-shipper
      object_store: filesystem
      schema: v11
      index:
        prefix: index_
        period: 24h

limits_config:
  ingestion_rate_strategy: local
  ingestion_rate_mb: 1024
  ingestion_burst_size_mb: 2048
  per_stream_rate_limit: 1024MB
  per_stream_rate_limit_burst: 2048MB
  max_global_streams_per_user: 0

storage_config:
  boltdb_shipper:
    shared_store: filesystem
    shared_store_key_prefix: index/
    active_index_directory: /data/loki/boltdb-shipper-active
    cache_location: /data/loki/boltdb-shipper-cache
  filesystem:
    directory: /dataceph/loki/chunks
table_manager:
  retention_deletes_enabled: true
  retention_period: 48h

distributor

target: distributor

<config-common>

ingester

target: ingester

<config-common>

Ring Status

Seems the tokens ownership is balanced among all ingester instances.

Issue Description

In summary, there is equitable of netflow among distributor, but there exists an imbalance among ingester.
Upon closer examination using itop, It is apparent that the imbalanced netflow originates from various distributors and converges towards a solitary ingester through port 9095.

Please give me some advice, thanks again.

tonyswumac · August 15, 2023, 4:16pm

I think this is a known “potential” problem with ingester. The reason this happens is because Loki tries to limit number of files written to storage, because in distributed processing it’s often faster to read bigger files less number of times, than smaller files more number of times. In order to do that, ingester routes the data based on the labels between the ring members so that the logs from the same stream (same sets of labels) end up on the same ingester, so that you end up with minimum number of files that need to be written.

This is the sort of problem that would be less noticeable the more number of ingesters you have (not saying you need to go crazy, we have 4 ourselves), and as long as your logs aren’t too unbalanced it’s probably not going to be an actual problem.

system · August 14, 2024, 4:16pm

This topic was automatically closed 365 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Scalability and Optimal Throughput in Loki Benchmarking Grafana Loki	1	571	August 22, 2024
Distributed Loki - Troubleshooting Distributor and Ingester on Kubernetes Grafana Loki	3	1840	November 11, 2022
Loki is not working. Loki Ingester container failing with no error logs on container Grafana Loki	1	533	June 22, 2024
Loki distributor does not load balncing Grafana Loki loki	2	1288	March 4, 2023
Tips for troubleshooting Ingester pod memory imbalance when running Loki in distributed mode in Kubernetes Grafana Loki loki , performance	3	720	April 5, 2024