At least 1 live replicas required, could only find 0 - unhealthy instances

obiwan · October 12, 2023, 5:34pm

I have Loki giving me this warning:

level=warn ts=2023-10-12T17:24:00.469737712Z caller=logging.go:123
  orgID=fake
  msg="POST /loki/api/v1/push (500) 859.661µs
       Response: \"at least 1 live replicas required, could only find 0
       - unhealthy instances: 10.254.0.86:65300\\n\"
  ws: false;
  Connection: close;
  Content-Length: 29114;
  Content-Type: application/x-protobuf;
  User-Agent: promtail/2.9.0;
  X-Forwarded-For: 1.2.3.4;
  X-Internal-Remote-Address: 1.2.3.4; "

I have three instances running. Notice the unhealthy instances: 10.254.0.86. That IP was used by a previous instance of Loki, but it has since been terminated. Why are the current instances of Loki still looking for old instances? And more importantly, how can I tell them to stop looking for the old instance and only look at the current instances?

obiwan · October 12, 2023, 6:27pm

If I visit the /ready endpoint, I see the message:

Ingester not ready: instance 10.254.0.56:63552 past heartbeat timeout

And in the logs I see this entry:

level=warn ts=2023-10-12T18:23:29.021351537Z caller=lifecycler.go:291
  msg="found an existing instance(s) with a problem in the ring, this
       instance cannot become ready until this problem is resolved.
       The /ring http endpoint on the distributor (or single binary)
       provides visibility into the ring."
  ring=ingester
  err="instance 10.254.0.56:63552 past heartbeat timeout"

If I go to the /distributor/ring endpoint I can see the “Ring Status” page. I click the “Forget” button on all the instances and then refresh and they come back. But in the logs it is still complaining about some unhealthy instances that no longer exist.

obiwan · October 12, 2023, 6:54pm

Thanks to this issue I found the ingester.autoforget_unhealthy: true configuration parameter.

level=info ts=2023-10-12T18:45:09.898482094Z caller=ingester.go:390 msg="autoforget removed ingester old-loki-instance-001 from the ring because it was not healthy after 1m0s"
level=info ts=2023-10-12T18:45:09.898511469Z caller=ingester.go:390 msg="autoforget removed ingester old-loki-instance-002 from the ring because it was not healthy after 1m0s"
level=info ts=2023-10-12T18:45:09.898519995Z caller=ingester.go:390 msg="autoforget removed ingester old-loki-instance-003 from the ring because it was not healthy after 1m0s"

The frustrating thing is that none of those showed up at the /distributor/ring endpoint.

Is there another/better way to see and manually “forget” instances that go unhealthy and never return?

rahuldhar · September 13, 2024, 4:02pm

I have the same issue with my Loki ingester pods. Where can I add this parameter, could you please confirm?

obiwan · September 13, 2024, 4:43pm

I put it in the ingester stanza:

        ingester:
          lifecycler:
            final_sleep: 0s
          chunk_idle_period: 5m
          chunk_retain_period: 30s
          max_transfer_retries: 0
          #autoforget_unhealthy: true # uncomment if instances become unhealthy and never return

Topic		Replies	Views
Response: \"at least 1 live replicas required, could only find 0 - unhealthy instances: 172.39.2.135:9095\\n\" Grafana Loki loki	9	460	June 25, 2025
Loki ingester unhealthy not getting autoforget Grafana Loki	5	326	March 4, 2025
Network may be partioned, skip forgeting ingesters this round Grafana Loki	1	1965	November 17, 2022
Unhealthy ingestor on loki distributor ring Configuration	2	154	May 16, 2025
Too many unhealthy instances in the ring Configuration helm , lgtm	1	535	May 22, 2025

At least 1 live replicas required, could only find 0 - unhealthy instances

Related topics