Ingester high memory usage

Hello! Please advise on the use of memory with the Loki - Ingester component.

I have the following setup: loki distributed v2.6.1 installed through the official helm chart in K8s.

The number of promtail clients is ~1000 hosts. Each of them generates a large load. About 5 million chunks (see screenshot below)


The number of loki_log_messages_total is 175 million per day.

My problem is that the ingester uses about 100GB of RAM per day. I want to understand if this is normal behavior or can I somehow reduce memory usage through config? Tried to adjust various parameters myself, in particular chunk_idle_period and max_chunk_age. But no matter what values I was setting, the consumption is still at the level of 100 GB.

Here is my config:

Config
auth_enabled: false
chunk_store_config:
  max_look_back_period: 0s
compactor:
  retention_enabled: true
  shared_store: s3
  working_directory: /var/loki/compactor
distributor:
  ring:
    kvstore:
      store: memberlist
frontend:
  compress_responses: true
  log_queries_longer_than: 5s
  tail_proxy_url: http://loki-distributed-querier:3100
frontend_worker:
  frontend_address: loki-distributed-query-frontend:9095
  grpc_client_config:
    max_recv_msg_size: 167772160
    max_send_msg_size: 167772160
ingester:
  autoforget_unhealthy: true
  chunk_block_size: 262144
  chunk_encoding: snappy
  chunk_idle_period: 5m
  chunk_retain_period: 30s
  lifecycler:
    ring:
      kvstore:
        store: memberlist
      replication_factor: 1
  max_chunk_age: 15m
  max_transfer_retries: 0
  wal:
    enabled: false
ingester_client:
  grpc_client_config:
    max_recv_msg_size: 167772160
    max_send_msg_size: 167772160
limits_config:
  cardinality_limit: 500000
  enforce_metric_name: false
  ingestion_burst_size_mb: 300
  ingestion_rate_mb: 150
  max_cache_freshness_per_query: 10m
  max_entries_limit_per_query: 1000000
  max_global_streams_per_user: 5000000
  max_label_name_length: 1024
  max_label_names_per_series: 300
  max_label_value_length: 8096
  max_query_series: 250000
  per_stream_rate_limit: 150M
  per_stream_rate_limit_burst: 300M
  reject_old_samples: true
  reject_old_samples_max_age: 168h
  retention_period: 72h
  split_queries_by_interval: 30m
memberlist:
  join_members:
  - loki-distributed-memberlist
querier:
  engine:
    timeout: 5m
  query_timeout: 5m
query_range:
  align_queries_with_step: true
  cache_results: true
  max_retries: 5
  results_cache:
    cache:
      enable_fifocache: true
      fifocache:
        max_size_items: 1024
        ttl: 24h
query_scheduler:
  grpc_client_config:
    max_recv_msg_size: 167772160
    max_send_msg_size: 167772160
runtime_config:
  file: /var/loki-distributed-runtime/runtime.yaml
schema_config:
  configs:
  - from: "2022-09-07"
    index:
      period: 24h
      prefix: loki_index_
    object_store: aws
    schema: v12
    store: boltdb-shipper
server:
  grpc_server_max_recv_msg_size: 167772160
  grpc_server_max_send_msg_size: 167772160
  http_listen_port: 3100
  http_server_idle_timeout: 300s
  http_server_read_timeout: 300s
  http_server_write_timeout: 300s
storage_config:
  aws:
    s3: https:/....
    s3forcepathstyle: true
  boltdb_shipper:
    active_index_directory: /var/loki/boltdb_shipper/index
    cache_location: /var/loki/boltdb_shipper/cache
    shared_store: s3
  index_cache_validity: 5m
table_manager:
  retention_deletes_enabled: false
  retention_period: 0s

In documentation I have not found any examples or information for heavy loads, so I decided to ask the community. I will be very grateful for help.

1 Like

Hi,
Can somebody please respond to this. We are seeing similar issue where ingester pods are using lots of RAM. Scaling out the ingester pod doesn’t seem to help as well. Although not as much as what @luxit mentioned, but we also have large amounts of logs being ingested into loki.

Any tuning recommendation for loki to scale for large production cluster … Thanks

This topic was automatically closed 365 days after the last reply. New replies are no longer allowed.