Loki 2.4.1 empty ring Code(500) error for "GET /loki/api/v1/labels" API on AWS ECS

tonyswumac · April 4, 2023, 4:08pm

That is not going to work. All containers within Loki cluster has to be able to connect to the writer containers “directly”. In order to do so, your containers have to be able to discover the writers through service discovery. If you look at the configuration here (About Grafana Mimir DNS service discovery | Grafana Mimir documentation), you’ll notice that even with SRV discovery it doesn’t propagate the port which means you have to use native port (meaning no bridge) for your writers. Second, if you are using simple scalable mode it does not make sense to have writer and reader in one ECS service. The advantage of simple scalable mode is so that you can scale writers and readers separately, by putting them on the same ECS service you kinda eliminated that.

Short of giving you our code (work related therefore can’t really do that), here is what I would recommend you to do:

Separate your writer and readers. The easiest way to do this would be to setup one ECS cluster but two autoscaling groups. You can give them different tags (google ECS_INSTANCE_ATTRIBUTES) like this:

echo 'ECS_INSTANCE_ATTRIBUTES={"loki-instance-catogery": "writer"}' >> /etc/ecs/ecs.config

and configure your ECS service to go to those instances with placement_constraints:

  placement_constraints {
    type       = "memberOf"
    expression = "attribute:loki-instance-catogery == writer"
  }

Since writers need dedicated persistent volume for WAL, you might as well just give them dedicated host. Therefore I recommend setting writers to DAEMON type. You’ll want a service discovery zone (say writers.services.discovery) for your writers (with A record, don’t do SRV), and you’ll want AWSVPC network mode because you need native port.
For readers I’d recommend creating another service discovery zone as well (with A record, say readers.services.discovery), with AWSVPC network mode, but you can set readers to REPLICA mode.
In your Loki configuration, configure memberlist like so:

memberlist:
  bind_addr: ['0.0.0.0']
  bind_port: 7946
  dead_node_reclaim_time: 30s
  gossip_interval: 2s
  gossip_to_dead_nodes_time: 15s
  left_ingesters_timeout: 30s
  join_members:
  - dns+writers.services.discovery:7946
  - dns+readers.services.discovery:7946

Lastly, you can overcome the NIC limitation of AWSVPC mode. See Elastic network interface trunking - Amazon Elastic Container Service.