Loki server with active passive mode

Hi Guys,

I’m wondering is it normal approach if I use Active/Passive for Loki server.
I have already tested and seems everything is ok.
I have reverse nginx with active/passive mode for 2 loki servers with the same config and NFS share disk.

Please share your topology a bit. It’s not clear what you mean by active passive.

Here are my opinions. There is no master/slave difference in Loki. All components that need to form a cluster do so by forming a membership ring. If you have two ingesters or two query frontends/readers, and you can send traffic to both of them, there is simply no reason to do active/passive over active/active.

Hi @tonyswumac ,

The idea was install active/passive Loki to decrease errors.
When I use cluster mode (membership ring) I noticed the errors in logs like:
msg=“failed to flush” err="failed to flush chunks: store put chunk: timeout, num_chunks: 1,
msg=“error syncing local boltdb files with storage”
component=tsdb-head-manager msg=“failed stopping wal” period=1903908 err=“write /loki/tsdb-shipper-active/wal/filesystem_2020-10-24/1713517247/00000000: stale NFS file handle”

There weren’t errors when one of Loki instanse was stopped but I if two insteans start works the errors becam again.

I assume these errors related to shared NFS storage and can be ignore ? WDYT ?


auth_enabled: false

server:
http_listen_port: 3200
grpc_listen_port: 9096
http_server_read_timeout: 300s # allow longer time span queries
http_server_write_timeout: 300s # allow longer time span queries
grpc_server_max_recv_msg_size: 33554432 # 32MiB (int bytes), default 4MB
grpc_server_max_send_msg_size: 33554432 # 32MiB (int bytes), default 4MB
log_level: info

common:
#instance_addr: 127.0.0.1
path_prefix: /loki
storage:
filesystem:
chunks_directory: /loki/chunks
rules_directory: /loki/rules
replication_factor: 1
ring:
kvstore:
store: inmemory

query_range:
results_cache:
cache:
embedded_cache:
enabled: true
max_size_mb: 100

schema_config:
configs:
- from: 2020-10-24
store: tsdb
object_store: filesystem
schema: v13
index:
prefix: index_
period: 24h

ruler:
alertmanager_url: http://localhost:9093

querier:
max_concurrent: 16 #It is recommended to put about twice the number of CPUs. The default is 10

query_scheduler:
max_outstanding_requests_per_tenant: 2048

and the HA config (same config with below different settings):


replication_factor: 1
ring:
kvstore:
store: memberlist

memberlist:
join_members:

  • grafana-loki-01:7946
  • grafana-loki-02:7946

I don’t have any experience running Loki behind an NFS, but I would recommend a couple of things to check:

  1. If your intent is to use NFS as your permanent storage, make sure when configuring filesystem only the index and chunk directories are NFS mounts. you don’t need ephemeral directories such as the WAL files to be on NFS.

  2. Try to tune your chunk age and idle period to reduce number of files written.