In my environment (Grafana version 11.1.0 / Loki version 3.1.0) low volume streams take longer to be searchable by querier compared to high volume streams. This is especially problematic for very low volume streams (1 event per day or less) where it can take more than 24 hours until the event is found by the querier. It seems like the ingester is not properly queried by the querier. As I send the same event to two different log-aggregation system, I know that the event was sent without delay. query_ingesters_within is set to 0 (but also tried 96h). Any help is appreciated.
Thank you @tonyswumac. I followed your advise and changed instance_addr to 0.0.0.0 and restarted the loki service. Unfortunately I do not see too much of a difference between binding the loki server to the loopback interface or to all interfaces.
To your second point, could you please explain me with more details how a memberlist configuration for a single binary deployment should look like? Until now my understanding was that this is only needed for distributed deployments.
Hi @tonyswumac, thank you for testing my configurations in your setup. Some differences I see between the two setups:
You are using docker. I am using “direct install”
I am running two loki server process on the same host (with both external and internal separated ports). But maybe there is some conflict between the two server processes that I do not realize.
The two loki server have some load. server process 1 has about 400’000 events/min and server process 2 has about 50’000 events/min. On stream level, the highest volume streams have about 60 events/minute and the lowest volume stream have some few events/day.
Ok, this is information I didn’t know. The way Loki storage works is new logs come in to ingester, it gets written into WAL, and it will not get committed to long term storage for some time (this depends on your chunk age and chunk size settings). During this time, your querier needs to query the ingester directly to get the logs that aren’t commited to long-term storage yet.
If you are running two Loki server, then you need to use memberlist to form a cluster between the two.
@tonyswumac, I have two loki server processes that should run along side and should have no contact to each other at all. So this means if I query server process 1 I should only get results from server 1, and the same for server process 2. That’s why I don’t want to have a memberlist between the two loki servers. But I could still have two separate memberlist each for one server process, but I would not know how to configure it.
I am not sure how else I can be of help, since I am unable to reproduce your problem.
I had assumed that you were running two Loki containers on the same host for a cluster. But if you are not, and they are supposed to be separate entities, then I would recommend you to separate them physically as well, and see if that fixes your problem.
Hi @tonyswumac, thanks again for your help. I finally found the problem I had in my setup. The problem was not in Loki it self but in the preceding log collection pipeline that did not push low volume streams in a timely manner towards loki (in contrast to the second channel going to an other log aggregation system). After upgrading the log collector this is resolved.