Loki cannot receive high volume logs fast enough

jowin319 · May 27, 2024, 4:03am

Dear Team Grafana Loki,
We have deployed Promtail across multiple servers to push logs to a central Loki instance, with the intention of displaying these logs in the Grafana UI. Our log volume reaches up to 10,000 lines per second at peak times. Initially, we encountered issues with an “empty ring error” due to the KV Store being in “in-memory” mode, which also caused out-of-memory errors when large volumes of logs were received. We resolved this by switching the KV Store to ETCD mode.
While this change has addressed the empty ring error, we are still facing challenges as the logs displayed in Grafana do not reflect the actual volume being sent. This indicates that Loki is not keeping up with the incoming log volume, resulting in log loss.
There are several approaches have been taken but the error still persists.
The KV Store has change from in-memory to ETCD
The configuration of the loki is shown as follow:
auth_enabled: false
server:
http_listen_port: 3100
grpc_server_max_recv_msg_size: 16777216
grpc_server_max_send_msg_size: 16777216
log_level: debug
common:
path_prefix: /loki
storage:
filesystem:
chunks_directory: /loki/chunks
rules_directory: /loki/rules
replication_factor: 1
ring:
kvstore:
store: etcd
etcd:
endpoints:
-
schema_config:
configs:

from: 2020-10-24
store: tsdb
object_store: filesystem
schema: v12
index:
prefix: index_
period: 24h
query_scheduler:
max_outstanding_requests_per_tenant: 8192
frontend:
max_outstanding_per_tenant: 8192
log_queries_longer_than: 10s
compress_responses: true
query_range:
parallelise_shardable_queries: true
align_queries_with_step: true
cache_results: true
limits_config:
split_queries_by_interval: 15m
max_query_length: 0h
max_query_parallelism: 32
ingestion_rate_strategy: local
ingestion_rate_mb: 32
ingestion_burst_size_mb: 64
max_streams_per_user: 0
max_entries_limit_per_query: 5000000
max_global_streams_per_user: 0
cardinality_limit: 200000
ruler:
alertmanager_url: http://localhost:9093

table_manager:
retention_deletes_enabled: true
retention_period: 2160h
analytics:
reporting_enabled: false
Even though changing the KV store from In-memory to ETCD has resolved the empty ring error but the server performance is still being poor as the memory leak issue is still persists but due to the uitlization of ETCD, there is no empty issue. The screenshot from zabbix is shown in the comment below for reference on the server memory performance
We are running Loki and Grafana on a virtual machine equipped with 16 cores and 64 GB of RAM.
Could you please provide guidance on further optimizations or configurations that could help us manage our logging needs effectively? The following screenshot is the performance of the server hosted the loki.

tonyswumac · May 27, 2024, 6:30am

I’d recommend you to consider moving to simple scalable mode with an object storage backend.

Topic		Replies	Views
Grafana Loki + Promtail could not access logs in real time Grafana Loki	5	311	May 16, 2024
Log-gaps when high amount of logs Grafana Loki logs	4	153	October 7, 2024
Grafana Ui not loading for the 50000 loki logs Configuration loki , grafana-ui , grafana	2	79	November 4, 2024
loki is consuming more space even for small logs. Grafana Loki	1	688	March 18, 2023
Promtail log synchronization data loss Grafana Loki loki , promtail	4	591	June 7, 2024

Loki cannot receive high volume logs fast enough

Related topics