Distributed Loki on Docker Swarm

I’m currently trying to deploy loki with a distributed configuration to my swarm cluster

For my setup, I’m only planning on splitting loki up with query-frontend, write and read components (since a lot of online references follow this logic)

I’m running into issues getting the cluster to work correctly. I’ve had success with a single node configuration but I wanted to have the option to scale up.

I’m using traefik as a reverse proxy for handling client requests

Errors I’m getting are:

"Remote state is encrypted and encryption is not configured\n\t* Failed to join VIRTUAL_IP:7946: Remote state is encrypted and encryption is not configured"

Where VIRTUAL_IP is the virtual ip configured for the swarm cluster which traefik handles. I’m not exactly sure what encryption means in this context, there’s not a lot of information found online aside from things related to consul.

msg="POST /loki/api/v1/push (500) 689.391µs Response: \"at least 2 live replicas required, could only find 1\\n\"

I imagine this has to do with the number of available ingesters not being available (since the ingesters can’t join the gossip ring). If I access ring or memberlist endpoints, I will only see one node.

I’d appreciate an extra pair of eyes that can see what exactly I’m missing or what I ‘oofed’ on.

Configuration:

Docker Stack File

  loki:
    ...
    command: -config.file=/etc/loki/local-config.yaml -target=query-frontend -frontend.downstream-url=http://loki-read.org.com
    deploy:
      mode: replicated
      replicas: 3    
      labels:
        ...
        - traefik.http.routers.loki.rule=Host(`loki-swarm.org.com`)
        - traefik.http.routers.loki.entrypoints=http
        - traefik.http.services.loki_svc.loadbalancer.server.port=3100
        - traefik.http.routers.loki.service=loki_svc

        # Memberlist
        - traefik.http.routers.loki_ml.rule=Host(`loki-swarm.org.com`)
        - traefik.http.routers.loki_ml.entrypoints=memberlist
        - traefik.http.services.loki_ml_svc.loadbalancer.server.port=7946
        - traefik.http.routers.loki_ml.service=loki_ml_svc

  loki-write:
    ...
    command: -config.file=/etc/loki/local-config.yaml -target=write
    deploy:
      mode: replicated
      replicas: 3        
      labels:
        ...
        - traefik.http.routers.loki_write.rule=Host(`loki-write.org.com`)
        - traefik.http.routers.loki_write.entrypoints=http
        - traefik.http.services.loki_write_svc.loadbalancer.server.port=3100
        - traefik.http.routers.loki_write.service=loki_write_svc

        # Memberlist
        - traefik.http.routers.loki_write_ml.rule=Host(`loki-write.org.com`)
        - traefik.http.routers.loki_write_ml.entrypoints=memberlist         
        - traefik.http.services.loki_write_ml_svc.loadbalancer.server.port=7946
        - traefik.http.routers.loki_write_ml.service=loki_write_ml_svc

  loki-read:
    ...
    command: -config.file=/etc/loki/local-config.yaml -target=read
    deploy:
      mode: replicated
      replicas: 3    
      labels:
        ...
        - traefik.http.routers.loki_read.rule=Host(`loki-read.org.com`)
        - traefik.http.routers.loki_read.entrypoints=http       
        - traefik.http.services.loki_read_svc.loadbalancer.server.port=3100
        - traefik.http.routers.loki_read.service=loki_read_svc

        # Memberlist
        - traefik.http.routers.loki_read_ml.rule=Host(`loki-read.org.com`)
        - traefik.http.routers.loki_read_ml.entrypoints=memberlist      
        - traefik.http.services.loki_read_ml_svc.loadbalancer.server.port=7946
        - traefik.http.routers.loki_read_ml.service=loki_read_ml_svc 

loki.yaml

auth_enabled: true

analytics:
  reporting_enabled: false

server:
  http_listen_port: 3100
  log_level: debug

common:
  compactor_address: http://loki-write.org.com

memberlist:
  join_members: 
    - 'loki-write.org.com'
    - 'loki-read.org.com'

  dead_node_reclaim_time: 30s
  gossip_to_dead_nodes_time: 15s
  left_ingesters_timeout: 30s
  gossip_interval: 2s
  bind_addr: ['0.0.0.0']
  bind_port: 7946  

distributor:
  ring:
    kvstore:
      store: memberlist

ingester:
  wal:
    enabled: true
    dir: /loki/wal
  lifecycler:
    join_after: 10s
    observe_period: 5s
    ring:
      kvstore:
        store: memberlist
      replication_factor: 3
    final_sleep: 0s
  chunk_idle_period: 5m
  chunk_retain_period: 30s
  flush_op_timeout: 10s
  max_chunk_age: 1m

schema_config:
  configs:
  - from: 2020-05-15
    store: boltdb-shipper
    object_store: s3
    schema: v11
    index:
      prefix: index_
      period: 24h

storage_config:
  boltdb_shipper:
    active_index_directory: /loki/index
    cache_location: /loki/index_cache
    resync_interval: 5s
    shared_store: s3
  aws:
    s3forcepathstyle: true
    bucketnames: loki
    endpoint: dev-minio.org.com
    region: us-east
    access_key_id: user
    secret_access_key: pass123
    insecure: true
    sse_encryption: false    

limits_config:
  enforce_metric_name: false
  reject_old_samples: true
  reject_old_samples_max_age: 30m

compactor:
  working_directory: /loki/boltdb-shipper-compactor
  shared_store: aws
  compaction_interval: 5m