Grafana → ELB → three loki nodes <-> S3
Fluentd →
The was our starting configuration when we stood up loki the first time.
Has been working for us.
Now that the dust has settled on 2.4, we’re trying to upgrade.
Node one is the zero dependency configuration with our IP addresses and it runs the compactor.
Node two and three as the same as one, minus the compactor.
They were defaulting to target=all in 2.3
I’ve added targets: "all" to node one, and targets: "distributor,querier,ingester" to nodes two and three.
The good news - the cluster starts, processes logs.
The bad news - searching from grafana/explore only works when the load balancer is pointing at node one. I’ve tried to add other targets to start on nodes two and three, but that ends in error.
Q1: Is there a functional reference design using multiple nodes behind a LB and not on Kubernetes that somebody can point me to?
Q2: Why is the compactor included in the “all” target when the documentation indicates it’s a singleton?
Q3: Is there a way to “subtract” the compactor from the “all” target? (i.e. “all,!compactor” )
Q4: Any other advice?
Just to circle back on this, I sorted out my configuration by reading code. 
This configuration allows all three nodes to start, run and deal with single node restarts.
The 2.4.2 upgrade + configuration changes has also eliminated these errors:
2022-02-21 12:01:18 level=error ts=2022-02-21T17:01:18.322640776Z caller=flush.go:220 org_id=fake msg="failed to flush user" err="RequestCanceled: request context canceled\ncaused by: context deadline exceeded"
Configuraiton (terraform template):
# Adapted from https://grafana.com/docs/loki/latest/configuration/examples/#almost-zero-dependencies-setup
auth_enabled: false
common:
ring:
kvstore:
store: memberlist
server:
http_listen_port: ${http_port}
grpc_listen_port: ${grpc_port}
log_level: warn
chunk_store_config:
chunk_cache_config:
enable_fifocache: false
query_range:
results_cache:
cache:
enable_fifocache: false
ingester:
lifecycler:
ring:
replication_factor: 1
final_sleep: 0s
# https://grafana.com/docs/loki/latest/best-practices/#use-chunk_target_size
chunk_target_size: 153600
max_chunk_age: 1h
flush_check_period: 5s
flush_op_timeout: 5m
chunk_idle_period: 30m
chunk_retain_period: 30s
# This must be set to 0 to use the WAL
max_transfer_retries: 0
wal:
enabled: true
dir: /data/wal
memberlist:
abort_if_cluster_join_fails: false
# Expose this port on all distributor, ingester
# and querier replicas.
bind_addr:
- ${local_ip}
bind_port: ${bind_port}
join_members:
%{ for ip in loki_ips ~}
- ${ip}
%{ endfor }
max_join_backoff: 1m
max_join_retries: 10
min_join_backoff: 1s
rejoin_interval: 30s
schema_config:
configs:
- from: 2020-05-15
store: boltdb-shipper
object_store: s3
schema: v11
index:
prefix: index_
period: 24h
storage_config:
boltdb_shipper:
active_index_directory: /data/loki/index
cache_location: /data/loki/index_cache
shared_store: s3
aws:
bucketnames: ${storage_bucket}
region: ${region}
insecure: false
sse_encryption: true
limits_config:
retention_period: ${retention_period}
enforce_metric_name: false
reject_old_samples: true
reject_old_samples_max_age: 168h
compactor:
working_directory: /data/retention
shared_store: s3
compaction_interval: 10m
retention_enabled: true
retention_delete_delay: 2h
retention_delete_worker_count: 150