Bringing logs from Linux audit subsystem into Loki

I’m currently evaluating what the best way would be ship logs generated by the kernel audit subsystem into Loki (see also this post from my colleague on the fediverse), as we’re in the process of retiring our existing stack based on Elastic’s auditbeat and Graylog.

We’re not too convinced of the reference implementation in auditd, as the log format it produces isn’t really amenable to analysis in Loki without modification, and it’s not a codebase we’d be comfortable letting access the network directly. We’ve found Slack’s go-audit promising, as it’s not a hairy C codebase and it outputs audit logs in JSON format, which is already much more Loki friendly. go-audit also supports sending audit logs over the network, and while it doesn’t support the Loki client push protocol that seems like it would be fairly straightforward to add.

In our current stack, the audit logs are first written to a file on disk by one process, which is then tailed by a second process which uploads logs to the log server. This gap between the audit logs being read and being shipped means the process is not entirely tamper-proof, and as we’re reimplementing our auditing stack it would be great if we could find a way to close this gap. Ideally the process reading the audit logs from the kernel should also be the one which manages shipping the logs to the log server.

One option would be to simply bolt the promtail client library (or a hand-written implementation of the JSON push API) into promtail, though this might not be easy to make sufficiently robust to be useful (e.g. how do you handle the Loki server being unavailable because it’s down for a kernel update?). An alternative approach would be to come at this from the opposite direction and instead teach Alloy to read audit logs from the kernel, as it already knows how to be a well-behaved log shipping client.

I’m curious if and how other people have solved the problem of getting audit logs into Loki. I’d also be willing to get out a Go compiler and write some code in order to get to the point where we can get our audit logs into Loki without them first having to go through a file on disk (or through the systemd journal, etc).

You can send it directly from your tool of choice (say go-audit) to Loki via API. Loki’s API is pretty straight forward to use, it won’t take much effort to code something functional. You can contribute this back to their repo, too.

You can also send it to an Alloy agent running either locally or somewhere within the same network, via Loki’s API. The advantage of this is to let Alloy handle write-ahead, buffering, backoff, or any other client side configuration that may be useful if you have a lot of audit logs.

(Colleague of molly here)

Something we’ve been unsure about: Alloy seems to be relatively heavy-weight. You’re mentioning a pattern that I was kind of considering already: running Alloy in strategic places instead of on every host or VM. However, that does feel like the official picture becomes a bit blurry as the tools to bridge that gap become fragmented again …

I only suggested this because of Molly’s concern on audit logs potentially being tempered after it’s processed by your local tools but before it’s picked up by Alloy, which implied that you’d want a solution that’s transient after the original logs are processed to limit the risk. Honestly I don’t think it’s a risk (in my opinion), and if we don’t consider that then I think it’s perfectly fine to run alloy on every VM.

I do agree Alloy is a bit on the heavy side compared to something like promtail. I am sure there will be other optimizations in the future. That said, it has not been a problem for us, but if you are running micro VMs then you can also consider something like fluentbit which is purposely designed to be lightweight.

Yeah, the Loki API is pretty simple, which is why I was considering retrofitting a Loki client into go-audit.

It had also occurred to me that we could use Alloy as a Loki proxy on the same host and then let Alloy handle the problems of getting data over the network to Loki properly. We would still have multiple moving parts on the host side, but other than writing an audit subsystem data source for Loki that seems unavoidable at this point. (In an ideal world Alloy would be able to bind a listener for the Loki API to a Unix socket so we could access control it properly instead of just using a TCP port on the loopback interface…)

In the process of writing this reply, it’s occurred to me that in principle we could run go-audit as a subprocess of Alloy, so go-audit reads the audit logs, formats and prints them on stdout, which is then read and collected by Alloy. Alloy doesn’t currently have a Loki source for reading from subprocesses, but that’s maybe something worth implementing and submitting upstream.

That go-audit is simple. Connect to socket and format received message into json. I guess you can reuse that code and create opentelemetry receiver. Then you can use opentelemetry builder and build only light Otel collector distribution with a few components, which you need e. g. your audit receiver and loki exporter (BTW that’s there already, but it’s deprecated, otlp exporter should be used, so thing about used protocol to be future proof).

Alloy is otel collector under the hood, so if you create otel receiver, then it can be integrated also into Alloy in the future.

We decided to run alloy on every VM.

At the moment we’re passing the audit data through the loopback device and that’s something we’d like to improve. We could make some time to draft a subprocess data source for Alloy if anyone is interested, but we’d like to get community input first: