For monitoring Kubernetes clusters, should metrics and log collection be separate deployments?

I noticed that the K8s monitoring helm chart deploys metrics and log collections separately, as in some pods are just for metrics and some are just for log collection. I am considering using the Grafana Alloy helm chart directly for greater customization, so I’ve been reverse engineering the K8s chart a bit to understand why it does what it does (there’s also a chance I just use the k8s monitoring chart, but at this point I’m pretty invested in figuring out how it works). I think I understand why but I was hoping to get some more insight from the community.

Pod Log Collection

It seems like there are two main ways that Alloy can retrieve logs, either directly from the node’s file system or through the Kubernetes API.

To collect logs from the filesystem, the steps are:

  1. Use a DaemonSet deployment of Grafana Alloy so there is one pod on each node
  2. Set alloy.mounts.dockercontainers: true in values.yaml so that the pod can access the host docker logs
  3. Configure alloy to only retrieve from pods in the same node
  4. Use a relabel rule combined with local.file_match to read the logs locally.

To collect logs from the Kubernetes API:

  1. Use a deployment or statefulset deployment of Grafana Alloy
  2. Enable clustering alloy.clustering.enabled: true in values.yaml (or you can just use 1 replica)
  3. Just use discovery.kubernetes and loki.source.kubernetes in alloy config without restricting to any specific node
  4. Enable clustering block under each loki.source.kubernetes

Metric Collection

I could be wrong here, but I think that when collecting metrics, there isn’t really a benefit to collecting metrics from the same node as Alloy is running, so there isn’t a good reason to use a DaemonSet.

  1. Use a deployment or statefulset
  2. Enable clustering alloy.clustering.enabled: true in values.yaml (or you can just use 1 replica)
  3. Just use discovery.kubernetes and prometheus.scrape in alloy config without restricting to any specific node
  4. Enable clustering block under each prometheus.scrape

Tying it all together

It seems to me like you could just have one Grafana Alloy deployment (deployment or statefulset) that collects both logs and metrics. Separating them out seems to just be for the benefit of reading logs directly from the file-system, which I imagine could be a lot more optimal than reading from the Kubernetes API, but I’m not really sure. The daemonset deployment for logs collection seems to have the downside of using a lot of resources though, as each node needs to dedicate CPU and memory instead of just having a couple pods for the whole cluster.

I think the k8s-monitoring helm chart still uses nodeexporter for local metrics collection, but I could be wrong.

In any case, you don’t necessarily need to separate between logs and metrics, but you may have to separate between pod logs / metrics vs cluster event logs / pull based metrics.

For example, let’s say you want to use DAEMON set to collect logs locally, and you also want to collect cluster event logs from API, then you don’t necessarily want to cluster your Alloy containers, so you might do two sets, one for local logs collection without cluster, and one that collects logs from API and is clustered.

I use k8s-monitoring helm chart, but if I were to do it from the ground up I would probably do something like this:

  1. Alloy daemon set, for both pod logs and host metrics.
  2. Alloy replica set, clustered, does the following:
  • Cluster event logs from API
  • Metrics scraping based on service discovery for app endpoints
  • Also functions kinda like a prometheus server, receives metrics from the DAEMON set Alloy and forwards to Mimir (don’t have to do this, but it’s simpler in terms of configuration)
  • Can do the same proxy function for logs too