Issues upgrading three loki nodes from 2.3.0 to 2.4.2

Grafana → ELB → three loki nodes <-> S3
Fluentd →

The was our starting configuration when we stood up loki the first time.

Has been working for us.
Now that the dust has settled on 2.4, we’re trying to upgrade.

Node one is the zero dependency configuration with our IP addresses and it runs the compactor.
Node two and three as the same as one, minus the compactor.

They were defaulting to target=all in 2.3

I’ve added targets: "all" to node one, and targets: "distributor,querier,ingester" to nodes two and three.

The good news - the cluster starts, processes logs.
The bad news - searching from grafana/explore only works when the load balancer is pointing at node one. I’ve tried to add other targets to start on nodes two and three, but that ends in error.

Q1: Is there a functional reference design using multiple nodes behind a LB and not on Kubernetes that somebody can point me to?

Q2: Why is the compactor included in the “all” target when the documentation indicates it’s a singleton?

Q3: Is there a way to “subtract” the compactor from the “all” target? (i.e. “all,!compactor” )

Q4: Any other advice?

Just to circle back on this, I sorted out my configuration by reading code. :frowning:
This configuration allows all three nodes to start, run and deal with single node restarts.

The 2.4.2 upgrade + configuration changes has also eliminated these errors:

2022-02-21 12:01:18 level=error ts=2022-02-21T17:01:18.322640776Z caller=flush.go:220 org_id=fake msg="failed to flush user" err="RequestCanceled: request context canceled\ncaused by: context deadline exceeded"

Configuraiton (terraform template):

# Adapted from https://grafana.com/docs/loki/latest/configuration/examples/#almost-zero-dependencies-setup
auth_enabled: false

common:
  ring:
    kvstore:
      store: memberlist

server:
  http_listen_port: ${http_port}
  grpc_listen_port: ${grpc_port}
  log_level: warn

chunk_store_config:
  chunk_cache_config:
    enable_fifocache: false

query_range:
  results_cache:
    cache:
      enable_fifocache: false

ingester:
  lifecycler:
    ring:
      replication_factor: 1
    final_sleep: 0s
  # https://grafana.com/docs/loki/latest/best-practices/#use-chunk_target_size
  chunk_target_size: 153600
  max_chunk_age: 1h
  flush_check_period: 5s
  flush_op_timeout: 5m
  chunk_idle_period: 30m
  chunk_retain_period: 30s
  # This must be set to 0 to use the WAL
  max_transfer_retries: 0
  wal:
    enabled: true
    dir: /data/wal

memberlist:
  abort_if_cluster_join_fails: false

  # Expose this port on all distributor, ingester
  # and querier replicas.
  bind_addr:
  - ${local_ip}
  bind_port: ${bind_port}

  join_members:
%{ for ip in loki_ips ~}
  - ${ip}
%{ endfor }

  max_join_backoff: 1m
  max_join_retries: 10
  min_join_backoff: 1s
  rejoin_interval: 30s

schema_config:
  configs:
  - from: 2020-05-15
    store: boltdb-shipper
    object_store: s3
    schema: v11
    index:
      prefix: index_
      period: 24h

storage_config:
 boltdb_shipper:
   active_index_directory: /data/loki/index
   cache_location: /data/loki/index_cache
   shared_store: s3

 aws:
  bucketnames: ${storage_bucket}
  region: ${region}
  insecure: false
  sse_encryption: true

limits_config:
  retention_period: ${retention_period}
  enforce_metric_name: false
  reject_old_samples: true
  reject_old_samples_max_age: 168h

compactor:
  working_directory: /data/retention
  shared_store: s3
  compaction_interval: 10m
  retention_enabled: true
  retention_delete_delay: 2h
  retention_delete_worker_count: 150