Hello! Please advise on the use of memory with the Loki - Ingester component.
I have the following setup: loki distributed v2.6.1 installed through the official helm chart in K8s.
The number of promtail clients is ~1000 hosts. Each of them generates a large load. About 5 million chunks (see screenshot below)
The number of loki_log_messages_total is 175 million per day.
My problem is that the ingester uses about 100GB of RAM per day. I want to understand if this is normal behavior or can I somehow reduce memory usage through config? Tried to adjust various parameters myself, in particular chunk_idle_period and max_chunk_age. But no matter what values I was setting, the consumption is still at the level of 100 GB.
Here is my config:
Config
auth_enabled: false
chunk_store_config:
max_look_back_period: 0s
compactor:
retention_enabled: true
shared_store: s3
working_directory: /var/loki/compactor
distributor:
ring:
kvstore:
store: memberlist
frontend:
compress_responses: true
log_queries_longer_than: 5s
tail_proxy_url: http://loki-distributed-querier:3100
frontend_worker:
frontend_address: loki-distributed-query-frontend:9095
grpc_client_config:
max_recv_msg_size: 167772160
max_send_msg_size: 167772160
ingester:
autoforget_unhealthy: true
chunk_block_size: 262144
chunk_encoding: snappy
chunk_idle_period: 5m
chunk_retain_period: 30s
lifecycler:
ring:
kvstore:
store: memberlist
replication_factor: 1
max_chunk_age: 15m
max_transfer_retries: 0
wal:
enabled: false
ingester_client:
grpc_client_config:
max_recv_msg_size: 167772160
max_send_msg_size: 167772160
limits_config:
cardinality_limit: 500000
enforce_metric_name: false
ingestion_burst_size_mb: 300
ingestion_rate_mb: 150
max_cache_freshness_per_query: 10m
max_entries_limit_per_query: 1000000
max_global_streams_per_user: 5000000
max_label_name_length: 1024
max_label_names_per_series: 300
max_label_value_length: 8096
max_query_series: 250000
per_stream_rate_limit: 150M
per_stream_rate_limit_burst: 300M
reject_old_samples: true
reject_old_samples_max_age: 168h
retention_period: 72h
split_queries_by_interval: 30m
memberlist:
join_members:
- loki-distributed-memberlist
querier:
engine:
timeout: 5m
query_timeout: 5m
query_range:
align_queries_with_step: true
cache_results: true
max_retries: 5
results_cache:
cache:
enable_fifocache: true
fifocache:
max_size_items: 1024
ttl: 24h
query_scheduler:
grpc_client_config:
max_recv_msg_size: 167772160
max_send_msg_size: 167772160
runtime_config:
file: /var/loki-distributed-runtime/runtime.yaml
schema_config:
configs:
- from: "2022-09-07"
index:
period: 24h
prefix: loki_index_
object_store: aws
schema: v12
store: boltdb-shipper
server:
grpc_server_max_recv_msg_size: 167772160
grpc_server_max_send_msg_size: 167772160
http_listen_port: 3100
http_server_idle_timeout: 300s
http_server_read_timeout: 300s
http_server_write_timeout: 300s
storage_config:
aws:
s3: https:/....
s3forcepathstyle: true
boltdb_shipper:
active_index_directory: /var/loki/boltdb_shipper/index
cache_location: /var/loki/boltdb_shipper/cache
shared_store: s3
index_cache_validity: 5m
table_manager:
retention_deletes_enabled: false
retention_period: 0s
In documentation I have not found any examples or information for heavy loads, so I decided to ask the community. I will be very grateful for help.