Using Loki for Information Security Oriented Logs - Guidance?

Hello!

We’re exploring replacing an Elastic Stack cluster with Loki. Are there any references around something like “I want to use Loki for information security related logs [among others]”, covering stuff like:

How do we avoid untenable situations like observability/debug-y logs triggering limits like max_global_streams_per_user=5000, resulting in logs that should never be dropped… being dropped? QoS of some type? Or do data sources or some other scope each get their own max_global_streams_per_user so we could spread things out that way?

And so forth. Basically, I don’t know what I don’t know, and want to make sure we don’t rush into using Loki for something it was not intended or designed for, and I assume there may be other gotcha’s we’re not considering, the limit example I gave was just the first issue that came up during our POC. Ignoring promtail not sufficiently parsing windows events, as we can work around that.

Cheers!

I think it depends on how critical your logs are. In my opinion in this sense ElasticSearch is the same as Loki, if you overwhelm it then… it’s overwhelmed. So you should tackle it from both client and server ends.

For server ends, you probably do want to run stress test and come up with a set of limit configurations that you are happy with, and further make sure you have monitoring on it so they aren’t exceeded accidentally.

For client ends, consider using a log agent that is capable of cache logs locally for some duration.

For really critical use case, I’d recommend considering an intermediary layer such as Kafka, something that is designed to stream massive amount of data efficiently, and let that be your temporary storage should something happen with your Loki cluster.

1 Like

Its important to monitor loki, especially in critical use cases like this one.
Then you can set alerts if an error like the one mentioned happens.