Loki cannot receive high volume logs fast enough

Dear Team Grafana Loki,
We have deployed Promtail across multiple servers to push logs to a central Loki instance, with the intention of displaying these logs in the Grafana UI. Our log volume reaches up to 10,000 lines per second at peak times. Initially, we encountered issues with an “empty ring error” due to the KV Store being in “in-memory” mode, which also caused out-of-memory errors when large volumes of logs were received. We resolved this by switching the KV Store to ETCD mode.
While this change has addressed the empty ring error, we are still facing challenges as the logs displayed in Grafana do not reflect the actual volume being sent. This indicates that Loki is not keeping up with the incoming log volume, resulting in log loss.
There are several approaches have been taken but the error still persists.
The KV Store has change from in-memory to ETCD
The configuration of the loki is shown as follow:
auth_enabled: false
server:
http_listen_port: 3100
grpc_server_max_recv_msg_size: 16777216
grpc_server_max_send_msg_size: 16777216
log_level: debug
common:
path_prefix: /loki
storage:
filesystem:
chunks_directory: /loki/chunks
rules_directory: /loki/rules
replication_factor: 1
ring:
kvstore:
store: etcd
etcd:
endpoints:
-
schema_config:
configs:

  • from: 2020-10-24
    store: tsdb
    object_store: filesystem
    schema: v12
    index:
    prefix: index_
    period: 24h
    query_scheduler:
    max_outstanding_requests_per_tenant: 8192
    frontend:
    max_outstanding_per_tenant: 8192
    log_queries_longer_than: 10s
    compress_responses: true
    query_range:
    parallelise_shardable_queries: true
    align_queries_with_step: true
    cache_results: true
    limits_config:
    split_queries_by_interval: 15m
    max_query_length: 0h
    max_query_parallelism: 32
    ingestion_rate_strategy: local
    ingestion_rate_mb: 32
    ingestion_burst_size_mb: 64
    max_streams_per_user: 0
    max_entries_limit_per_query: 5000000
    max_global_streams_per_user: 0
    cardinality_limit: 200000
    ruler:
    alertmanager_url: http://localhost:9093

table_manager:
retention_deletes_enabled: true
retention_period: 2160h
analytics:
reporting_enabled: false
Even though changing the KV store from In-memory to ETCD has resolved the empty ring error but the server performance is still being poor as the memory leak issue is still persists but due to the uitlization of ETCD, there is no empty issue. The screenshot from zabbix is shown in the comment below for reference on the server memory performance
We are running Loki and Grafana on a virtual machine equipped with 16 cores and 64 GB of RAM.
Could you please provide guidance on further optimizations or configurations that could help us manage our logging needs effectively? The following screenshot is the performance of the server hosted the loki.

I’d recommend you to consider moving to simple scalable mode with an object storage backend.