I wonder if someone from the community faced the same issue.
We’re using Loki deployed to an on prem OpenShift cluster for some time now, quite successfully. However, from time to time we see issues with pods that produce very verbose logs, for example, 300 lines are produced in a second.
Here’s an example log:
25-08-2021 14:19:45.822 log goes on
25-08-2021 14:19:46.257 log goes on
25-08-2021 14:19:46.257 log goes on
25-08-2021 14:19:46.259 log goes on
25-08-2021 14:19:46.258 log goes one
We see logs lines that were produced earlier after more recent log lines (last 3 lines of the example above). Some of the log lines contain SQL queries which are multi line and when log lines aren’t ordered properly it’s really difficult to read logs.
Having checked pod logs directly, no clock issues found, i.e. the app writes logs OK and there were no ntp or cpu issues too. This happens to very verbose logs only.
I wonder if there are some settings to handle such situations, maybe buffer logs and send them to loki in bigger chunks?
Using promtail and loki 2.1.0 and NFS3 based persistent volume (NetApp) for Loki data storage. Loki stateful set is scaled to 1 replica, using default cpu/mem requests and limits