Loki in HA on Docker Swarm

I run Loki with S3(minio) storage for logs and trying to run it in HA in docker swarm with 2 nodes. The problem is, they are not able to create a ring. This is my configuration of loki (only difference between nodes is node name):

auth_enabled: false

server:
  http_listen_port: 3100

common:
  instance_interface_names:
    - "lo"
  path_prefix: /loki
  storage:
    filesystem:
      chunks_directory: /loki/chunks
      rules_directory: /loki/rules
  replication_factor: 2
  ring:
    instance_interface_names:
      - "lo"
    kvstore:
      store: memberlist

memberlist:
  abort_if_cluster_join_fails: false
  randomize_node_name: false
  node_name: loki1
  bind_port: 7946

  join_members:
  - loki1:7946
  - loki2:7946

  max_join_backoff: 1m
  max_join_retries: 10
  min_join_backoff: 1s

compactor:
  working_directory: /loki/compactor
  shared_store: s3
  compaction_interval: 5m

storage_config:
  boltdb_shipper:
    active_index_directory: /loki/index
    cache_location: /loki/index_cache
    shared_store: s3
    cache_ttl: 24h

  aws:
    s3: http://minio:9000
    bucketnames: loki
    endpoint: minio:9000
    insecure: true
    access_key_id: minio
    secret_access_key: miniominio
    s3forcepathstyle: true

schema_config:
  configs:
    - from: 2020-10-24
      store: boltdb-shipper
      object_store: aws
      schema: v11
      index:
        prefix: index_
        period: 24h

Ther are able to see on each other and both of the has opened port 7946:

# From lok1
netstat -ltpn  | grep 7946 && ping -c 1 loki2
tcp        0      0 :::7946                 :::*                    LISTEN      1/loki
PING loki2 (10.0.6.162): 56 data bytes
64 bytes from 10.0.6.162: seq=0 ttl=42 time=0.094 ms

--- loki2 ping statistics ---
1 packets transmitted, 1 packets received, 0% packet loss
round-trip min/avg/max = 0.094/0.094/0.094 ms

# From lok2
netstat -ltpn  | grep 7946 && ping -c 1 loki1
tcp        0      0 :::7946                 :::*                    LISTEN      1/loki
PING loki1 (10.0.6.160): 56 data bytes
64 bytes from 10.0.6.160: seq=0 ttl=42 time=0.150 ms

--- loki1 ping statistics ---
1 packets transmitted, 1 packets received, 0% packet loss
round-trip min/avg/max = 0.150/0.150/0.150 ms

When I check distributor ring api:
On Loki1:


On Loki2:

  • Loki1 does not see Loki2.
  • Loki2 does see Loki1 but evaluate it as Unhealthy. Comparition of Ownership differ from time to time, last time It was 49 to 51 percent.

From logs on Loki1:

2022-06-25T22:24:41.292941593Z ts=2022-06-25T22:24:41.292132302Z caller=memberlist_logger.go:74 level=warn msg="Failed to resolve loki2:7946: lookup loki2 on 127.0.0.11:53: no such host"
2022-06-25T22:24:44.318437095Z ts=2022-06-25T22:24:44.317763386Z caller=memberlist_logger.go:74 level=warn msg="Got ping for unexpected node 'loki2' from=[::]:7946"
2022-06-25T22:24:46.319644804Z ts=2022-06-25T22:24:46.317614471Z caller=memberlist_logger.go:74 level=warn msg="Got ping for unexpected node loki2 from=127.0.0.1:42442"
2022-06-25T22:24:46.319867679Z ts=2022-06-25T22:24:46.317690929Z caller=memberlist_logger.go:74 level=error msg="Failed fallback ping: EOF"
2022-06-25T22:24:49.316437708Z ts=2022-06-25T22:24:49.315831583Z caller=memberlist_logger.go:74 level=info msg="Suspect loki2 has failed, no acks received"
2022-06-25T22:24:54.318574960Z ts=2022-06-25T22:24:54.31814571Z caller=memberlist_logger.go:74 level=warn msg="Got ping for unexpected node 'loki2' from=[::]:7946"
2022-06-25T22:24:56.318793253Z ts=2022-06-25T22:24:56.318574211Z caller=memberlist_logger.go:74 level=warn msg="Got ping for unexpected node loki2 from=127.0.0.1:42480"
2022-06-25T22:24:56.318846295Z ts=2022-06-25T22:24:56.318640586Z caller=memberlist_logger.go:74 level=error msg="Failed fallback ping: EOF"
2022-06-25T22:25:04.318133298Z ts=2022-06-25T22:25:04.316741132Z caller=memberlist_logger.go:74 level=info msg="Suspect loki2 has failed, no acks received"
2022-06-25T22:25:04.318222173Z ts=2022-06-25T22:25:04.317927423Z caller=memberlist_logger.go:74 level=warn msg="Got ping for unexpected node 'loki2' from=[::]:7946"
2022-06-25T22:25:06.318917258Z ts=2022-06-25T22:25:06.318676008Z caller=memberlist_logger.go:74 level=warn msg="Got ping for unexpected node loki2 from=127.0.0.1:42496"
2022-06-25T22:25:06.318963133Z ts=2022-06-25T22:25:06.318794924Z caller=memberlist_logger.go:74 level=error msg="Failed fallback ping: EOF"
2022-06-25T22:25:09.322544342Z ts=2022-06-25T22:25:09.322220051Z caller=memberlist_logger.go:74 level=info msg="Marking loki2 as failed, suspect timeout reached (0 peer confirmations)"
2022-06-25T22:25:19.316513625Z ts=2022-06-25T22:25:19.316218583Z caller=memberlist_logger.go:74 level=info msg="Suspect loki2 has failed, no acks received"

According to documentation, I have configured common.ring seciton and memberlist section but it seems it still cannot create a cluster according to errors.

Can somebody help me to find out why my loki1 and loki2 cannot create cluster?
Is it ok to mount both of them the same S3 bucket? From logs it looks like problem of other type than shared storage.

Thank you for any tips.

2 Likes

This topic was automatically closed 365 days after the last reply. New replies are no longer allowed.