Using Explore, we tried to simulate getting logs from the server to Loki and see how ‘real time’ it is.
We get different results. Sometimes we get 10 seconds delay, sometimes 1 hour. But i believe the idea of real time is that it should show with at most a 1sec delay, right?
We are using promtail, and verified there is not network bottleneck.
Depending on your setting, log streams (especially the small ones) can stay on the ingester for a while being being written to storage. Loki reader can query Loki writer within a certain time frame (this is thequery_ingesters_within configuation), and if you are seeing delay this is a likely cause that either your query_ingesters_within is not configured correctly, or your reader can’t connect to writers.
Just to provide more context, our log volume reaches up to 10,000 lines per second at peak times, and the issue now is that Grafana does not display the volume of logs being sent. We suspect Loki is not keeping up with the volume of logs being sent, but are unsure.
We test this in 2 ways:
Test 1: Server 1 (Loki) ← Server 2 (Promtail
Test 2: Server 1 (Loki + Promtail)
Both simulating 10,000 Logs. Both failed to get the data fast enough, and in full (meaning huge chunk of logs missing).
I think your configuration overall looks good. Perhaps try increasing the size of chunk files before it’s written so your ingester can write to file system less often by tweaking chunk_target_size.
But honestly it may be time for you to consider simple scalable mode with an object storage backend.