I noticed that the K8s monitoring helm chart deploys metrics and log collections separately, as in some pods are just for metrics and some are just for log collection. I am considering using the Grafana Alloy helm chart directly for greater customization, so I’ve been reverse engineering the K8s chart a bit to understand why it does what it does (there’s also a chance I just use the k8s monitoring chart, but at this point I’m pretty invested in figuring out how it works). I think I understand why but I was hoping to get some more insight from the community.
Pod Log Collection
It seems like there are two main ways that Alloy can retrieve logs, either directly from the node’s file system or through the Kubernetes API.
To collect logs from the filesystem, the steps are:
- Use a DaemonSet deployment of Grafana Alloy so there is one pod on each node
- Set
alloy.mounts.dockercontainers: truein values.yaml so that the pod can access the host docker logs - Configure alloy to only retrieve from pods in the same node
- Use a relabel rule combined with local.file_match to read the logs locally.
To collect logs from the Kubernetes API:
- Use a
deploymentorstatefulsetdeployment of Grafana Alloy - Enable clustering
alloy.clustering.enabled: truein values.yaml (or you can just use 1 replica) - Just use
discovery.kubernetesandloki.source.kubernetesin alloy config without restricting to any specific node - Enable clustering block under each
loki.source.kubernetes
Metric Collection
I could be wrong here, but I think that when collecting metrics, there isn’t really a benefit to collecting metrics from the same node as Alloy is running, so there isn’t a good reason to use a DaemonSet.
- Use a
deploymentorstatefulset - Enable clustering
alloy.clustering.enabled: truein values.yaml (or you can just use 1 replica) - Just use
discovery.kubernetesandprometheus.scrapein alloy config without restricting to any specific node - Enable clustering block under each
prometheus.scrape
Tying it all together
It seems to me like you could just have one Grafana Alloy deployment (deployment or statefulset) that collects both logs and metrics. Separating them out seems to just be for the benefit of reading logs directly from the file-system, which I imagine could be a lot more optimal than reading from the Kubernetes API, but I’m not really sure. The daemonset deployment for logs collection seems to have the downside of using a lot of resources though, as each node needs to dedicate CPU and memory instead of just having a couple pods for the whole cluster.