Multi-tenant Loki and RBAC

Hi there,

New to Loki, and trying to architect a solution that aligns namespace-based RBAC in Kubernetes clusters to access to Grafana.

High-level setup:

  1. I have Kubernetes clusters running, with various teams accessing each cluster. For instance, a ‘platform’ team has full access to all clusters, but ‘team-a’ only has access to a ‘team-a’ namespace on a specific cluster.
  2. I want to use the K8s-monitoring Helm chart to deploy Alloy (one deployment per cluster, preferably) to send all (cluster, node, pod) logs to a central Loki
  3. Depending on their role, I want users to only be able to see and query logs in Grafana based on their RBAC in K8s.

I’ve been Claud Coding my way out of this, and it gives me two options:

  1. a Helm values.yml that uses TenantIDs (platform, team-a, etc.) and per-tenant Loki destinations (all to the same Loki instance) with some extraConfig magic that splits up manually configured namespace entries in the config into destinations. Pro: can use OSS Loki. Con: has lots of configuration in the Helm that gets outdated easily as new namespaces and teams are working on these clusters.
  2. Or a simple per-cluster Helm values.yml, and move all of the tenant logic to Loki using LBAC (label-based access control). Pro: simple values.yml, simple deployment. Con: requires enterprise Loki. Con: still have to re-construct the namespace-based RBAC from K8s in Loki, but with LBAC.

Am I making sense here? What would be the best architecture for this?
Joep

Is it absolutely a requirement to control access to specific namespaces? If so I think it’ll be considerably more difficult.

I am not sure how you’ll do RBAC, so I’ll use the most basic multi-tenant setup as an example, which is loki cluster with read and write paths, and nginx reverse proxy in front that handles authentication. The authentication will be basic auth, with a nginx map configured to map user to tenant and supply the X-Scope-OrgID header after authentication.

First let’s consider ingestion. If you have to create one org per namespace per cluster ingestion will be quite messy. Assuming you are using alloy, I don’t think you can dynamically send logs to different loki.write block, which means you’d have to manually adjust the configuration whenever you are adding a new namespace. or you’d have to deploy one set of alloy daemon container per namespace, neither is feasible in my opinion. If you just need one org per cluster then it’s much easier.

When reading logs you’d run into the same problem, because you have to map user to org IDs you’ll have to carefully maintain the mapping on nginx. But if you think about it on the read path it’s actually the same problem, just that one is harder to maintain than the other. With access control on just cluster you can probably get away with semi-automation, but if you need to control on namespace level you’d need to invest some effort.

I don’t think it’s impossible, but it’ll depend on how much effort you want to put into it. I think the read path will be easier to handle, the write path is where you will either have to over-provision your alloy collectors, find some other solution (such as fluentd or others that may allow you to be more dynamic), or manage a ton of messy pipelines. It’s definitely easier if you just need to worry about cluster-level access, but if you really need to do it with some automation it’s probably do-able.