Hello folks,
We’re currently testing our loki infrastructure resiliency hosted on AWS. Here is the configuration we use.
We took care that each ingester/distributor are spread into 3 different AZ (and we did the same for querier)
The test is a simulation of network failures between AZ A and AZ B.
So basically, it happened that each of our distributor is only considering healthy the ingester located in the same AZ.
Distributor AZ A :
- Ingester AZ A : OK
- Ingester AZ B: KO
- Ingester AZ C: KO
Distributor AZ B :
- Ingester AZ A: KO
- Ingester AZ B: OK
- Ingester AZ C: KO
Distributor AZ C :
- Ingester AZ A: KO
- Ingester AZ B: KO
- Ingester AZ C: OK
We assumed that in that case scenario that
- A can only see C (so we thought that A and C should be healthy)
- B can only see C (so we thought that B and C should be healthy)
- C can see both A and B (so we thought that everything should be healthy)
It seems that everyone must reach an endpoint in order to consider it healthy. Does the case test not possibly covered by Loki yet? Have we missed something?
Thanks in advance!