Loki Simple Scalable Deployment Read Node Discrepancy

Issue: Our simple scalable Loki deployment (VM Cluster behind Consul, not K8s) is suffering from an odd issue where one read node gets hammered particularly more than the others. If we remove the node from the memberlist, another node will take it’s place, so regardless, one read node is being hit significantly harder.

For example:
Read node total: 5
Read node 1: 3GB over 5m
Read nodes 2-5: 250MB over 5m

Here is our configuration:

----------------------------------------------------------------------------

Loki Settings

----------------------------------------------------------------------------

loki_version: “2.8.2”

loki_auth_url: loki.apps.it.ufl.edu
loki_cert: “{{ lookup(‘hashi_vault’, ‘secret/data/services/certs/ce/.loki.apps.it.ufl.edu:fullchain’ ) }}"
loki_key: "{{ lookup(‘hashi_vault’, 'secret/data/services/certs/ce/
.loki.apps.it.ufl.edu:key’ ) }}”
loki_consul_url: loki.service.{{ deployment.environment }}.consul.it.ufl.edu

loki_http_port: 8443
loki_grpc_port: 9443

set the number of cores to use for loki equal to the number of vcpus on the host,

and set the garbage collection to be more aggressive

loki_systemd_environment: >-
GOMAXPROCS={{ ansible_processor_vcpus | default(ansible_processor_count) }}
GOGC=20
loki_system_user: loki
loki_system_group: loki
loki_config_dir: /etc/loki
loki_storage_dir: /loki

loki_auth_enabled: true

Max allowed throughput for a tenant/user/org-id per node. This value seems to be a pretty good

balance between giving people enough breathing room to send tons of logs while at the same time

preventing them from taking down the cluster if they try and send too much on a single tenant.

If someone needs to push more than this, consider adjusting the value for their tenant

specifically rather than adjust it globally here.

_max_tenant_throughput_mb: 40

Recommended burst value is 1.5x the max throughput value

_max_tenant_throughput_burst_mb: 60

Max allowed query timeout in seconds. Adjust this value here rather than in the loki config below.

Note you also probably need to adjust the timeout value in grafana.ini under dataproxy.

_max_query_timeout: 300

loki_config:
common:
# replication_factor needed for high availability.
# data is sent to replication_factor nodes.
# max allowable failure = floor(replication_factor / 2) + 1
replication_factor: 3
ring:
kvstore:
store: memberlist
heartbeat_timeout: 10m
storage:
s3:
bucketnames: loki
endpoint: object-prod.ceph.apps.it.ufl.edu
region: default
access_key_id: “”
secret_access_key: “”
insecure: false
s3forcepathstyle: true
http_config:
insecure_skip_verify: true

server:
log_level: info
http_listen_port: “{{ loki_http_port }}”
http_tls_config:
cert_file: “{{ loki_config_dir }}/ssl/cert.crt”
key_file: “{{ loki_config_dir }}/ssl/cert.key”
# 10 seconds longer than the query timeout so that the client can get an informative error message from loki
# rather than an http timeout
http_server_read_timeout: “{{ _max_query_timeout + 10 }}s”
http_server_write_timeout: “{{ _max_query_timeout + 10 }}s”

grpc_listen_port: "{{ loki_grpc_port }}"
# increase grpc limits from default
grpc_server_max_recv_msg_size: 104857600
grpc_server_max_send_msg_size: 104857600
grpc_server_max_concurrent_streams: 1000

ingester:
# Recommended 1-2h
# Setting too low will cause too much data in index and too much churning
chunk_idle_period: 1h
# Reduce burden on disk write
flush_check_period: 10s
max_chunk_age: 2h
wal:
# according to Write Ahead Log | Grafana Loki documentation, replay_memory_ceiling should be
# set to 75% available memory
replay_memory_ceiling: “{{ (ansible_memtotal_mb * 0.75) | int }}MB”

querier:
multi_tenant_queries_enabled: true

max_concurrent: 16

memberlist:
abort_if_cluster_join_fails: false
bind_port: 7946

# explicitly map out all the loki nodes rather than use consul, since consul is configured
# to only return a subset of the nodes when doing a DNS lookup
join_members: "{{ groups['loki'] }}"

max_join_backoff: 1m
max_join_retries: 10
min_join_backoff: 1s
# auto attempt to rejoin cluster if disconnected, helps prevent split brain
rejoin_interval: 1m

schema_config:
configs:
- from: 2020-05-15
store: boltdb-shipper
object_store: s3
schema: v11
index:
prefix: index_
period: 24h

  - from: "2023-03-05"
    store: tsdb
    object_store: s3
    schema: v12
    index:
      prefix: index_tsdb_
      period: 24h

storage_config:

hedging:
  at: "250ms"
  max_per_second: 20
  up_to: 3

boltdb_shipper:
  active_index_directory: "{{ loki_storage_dir }}/boltdb-shipper-active"
  cache_location: "{{ loki_storage_dir }}/boltdb-shipper-cache"
  cache_ttl: 24h # Can be increased for faster performance over longer query periods, uses more disk space
  shared_store: s3

tsdb_shipper:
  active_index_directory: "{{ loki_storage_dir }}/tsdb-shipper-active"
  cache_location: "{{ loki_storage_dir }}/tsdb-shipper-cache"
  shared_store: s3

query frontend

frontend:
log_queries_longer_than: 15s
compress_responses: true

frontend_worker:
grpc_client_config:
max_send_msg_size: 1.048576e+08
parallelism: 12

query_range:
align_queries_with_step: true
max_retries: 5
cache_results: true
results_cache:
cache:
embedded_cache:
enabled: true
max_size_mb: 2048
ttl: 1h

query_scheduler:
# needed to avoid getting “429 too many requests” errors on when querying

max_outstanding_requests_per_tenant: 32768

limits_config:
enforce_metric_name: false
# Throughput for a tenant/user/org-id per node
ingestion_rate_mb: “{{ _max_tenant_throughput_mb }}”
ingestion_burst_size_mb: “{{ _max_tenant_throughput_burst_mb }}”
# throughput for a single log stream (i.e. unique selection of label keys and values) per node
# just set it to the same as ingestion_rate_mb so users can use all their quota for a single stream
# if they want to
per_stream_rate_limit: “{{ _max_tenant_throughput_mb }}MB”
per_stream_rate_limit_burst: “{{ _max_tenant_throughput_burst_mb }}MB”
# max log entries that can be returned for a query
max_entries_limit_per_query: 100000
# when using parsers at query time, this limits how many streams the
# result can be split up into. 5000-20000 is a reasable range

max_global_streams_per_user: 20000
retention_period: 2w
# how long a read query can run before being cancelled
query_timeout: "{{ _max_query_timeout }}s"
# dont allow caching results from within the last "max_cache_freshness_per_query" duration, to
# prevent caching very recent results that are likely to change
max_cache_freshness_per_query: "10m"
# parallelize queries in 15min intervals
split_queries_by_interval: 10m
# limit how far back we will accept logs
reject_old_samples: true
# query frontend parallelism
max_query_parallelism: 32

compactor:
working_directory: “{{ loki_storage_dir }}/compactor”
shared_store: s3
compaction_interval: 1m
retention_enabled: true

We are using Consul for service mesh with the following host configuration:

{
“acl”: {
“default_policy”: “allow”,
“down_policy”: “extend-cache”,
“enable_token_persistence”: true,
“enabled”: true,
“token_ttl”: “30s”,
“tokens”: {
“default”: “c90c60ad-3267-591f-fe48-ff5a6e4e0d32”
}
},
“addresses”: {
“dns”: “127.0.0.1”,
“grpc”: “127.0.0.1”,
“http”: “127.0.0.1”,
“https”: “127.0.0.1”
},
“advertise_addr”: “10.51.31.35”,
“advertise_addr_wan”: “10.51.31.35”,
“auto_encrypt”: {
“tls”: true
},
“bind_addr”: “0.0.0.0”,
“client_addr”: “127.0.0.1”,
“data_dir”: “/opt/consul”,
“datacenter”: “dc”,
“disable_update_check”: false,
“domain”: “<consul_domain>”,
“enable_local_script_checks”: false,
“enable_script_checks”: false,
“encrypt”: “”,
“encrypt_verify_incoming”: true,
“encrypt_verify_outgoing”: true,
“log_file”: “/var/log/consul/consul.log”,
“log_level”: “INFO”,
“log_rotate_bytes”: 0,
“log_rotate_duration”: “24h”,
“log_rotate_max_files”: 0,
“node_name”: “az1-ce-o11y-prod-loki-read-03”,
“performance”: {
“leave_drain_time”: “5s”,
“raft_multiplier”: 1,
“rpc_hold_timeout”: “7s”
},
“ports”: {
“dns”: 8600,
“grpc”: 8502,
“http”: 8500,
“https”: 8501,
“serf_lan”: 8301,
“serf_wan”: 8302,
“server”: 8300
},
“primary_datacenter”: “dc”,
“raft_protocol”: 3,
“recursors”: [
“128.227.30.252”,
“8.6.245.30”,
“8.8.8.8”
],
“retry_interval”: “30s”,
“retry_join”: [
“10.51.23.20”,
“10.51.148.20”,
“10.243.95.20”
],
“retry_max”: 0,
“server”: false,
“tls”: {
“defaults”: {
“ca_file”: “/etc/consul/ssl/ca.pem”,
“tls_min_version”: “TLSv1_2”,
“verify_incoming”: false,
“verify_outgoing”: true
},
“https”: {
“verify_incoming”: false
},
“internal_rpc”: {
“verify_incoming”: false,
“verify_server_hostname”: true
}
},
“translate_wan_addrs”: false,
“ui_config”: {
“enabled”: true
}

Please let me know if any adjustments should be made to our configuration to balance traffic amongst the read nodes evenly.

Thanks!

I think your query frontend perhaps isn’t configured properly. There are two ways to configure query frontend, pull mode or push mode, I see neither in your configuration. See Query frontend example | Grafana Loki documentation.

We are running our Loki cluster in “simple scalable deployment” mode and outside of a Kubernetes cluster, but all of the documentation seems tailored towards K8s. Can you help me to identify what should be added to this configuration to get the query frontend working?

It’s in the link provided above. The configuration of query frontend has no relation to how you deploy it.