Using discovery.kubernetes Alloy pushes a lot of failed to create fsnotify watcher: too many open files

crazyelectronio · July 26, 2024, 7:18am

Just switched to Grafana Alloy and did the setup for Kubernetes log collection as shown in the documentation. The logs (or at least most of it?) get collected and forwarded to Loki, but all containers constantly log failed to create fsnotify watcher: too many open files, including the Alloy containers.
It’s a small 6-node cluster with less than 80 pods running on it.

I installed Alloy using the latest Helm chart and the following config:

alloy:
  configMap:
    # -- Create a new ConfigMap for the config file.
    create: true
    # -- Content to assign to the new ConfigMap.  This is passed into `tpl` allowing for templating from values.
    content: |-
      logging {
        level = "info"
        format = "logfmt"
      }

      loki.write "default" {
        endpoint {
          url = "http://loki-svc.logging.svc.cluster.local:3100/loki/api/v1/push"
        }
      }

      // local.file_match discovers files on the local filesystem using glob patterns and the doublestar library. It returns an array of file paths.
      local.file_match "node_logs" {
        path_targets = [{
            // Monitor syslog to scrape node-logs
            __path__  = "/var/log/syslog",
            job       = "node/syslog",
            node_name = env("HOSTNAME"),
            cluster   = "testkube",
        }]
      }

      // loki.source.file reads log entries from files and forwards them to other loki.* components.
      // You can specify multiple loki.source.file components by giving them different labels.
      loki.source.file "node_logs" {
        targets    = local.file_match.node_logs.targets
        forward_to = [loki.write.default.receiver]
      }

      // discovery.kubernetes allows you to find scrape targets from Kubernetes resources.
      // It watches cluster state and ensures targets are continually synced with what is currently running in your cluster.
      discovery.kubernetes "pod" {
        role = "pod"
        selectors {
          role = "pod"
          field = "spec.nodeName=" + coalesce(env("HOSTNAME"), constants.hostname)
        }
      }

      // discovery.relabel rewrites the label set of the input targets by applying one or more relabeling rules.
      // If no rules are defined, then the input targets are exported as-is.
      discovery.relabel "pod_logs" {
        targets = discovery.kubernetes.pod.targets

        // Label creation - "namespace" field from "__meta_kubernetes_namespace"
        rule {
          source_labels = ["__meta_kubernetes_namespace"]
          action = "replace"
          target_label = "namespace"
        }

        // Label creation - "pod" field from "__meta_kubernetes_pod_name"
        rule {
          source_labels = ["__meta_kubernetes_pod_name"]
          action = "replace"
          target_label = "pod"
        }

        // Label creation - "container" field from "__meta_kubernetes_pod_container_name"
        rule {
          source_labels = ["__meta_kubernetes_pod_container_name"]
          action = "replace"
          target_label = "container"
        }

        // Label creation -  "app" field from "__meta_kubernetes_pod_label_app_kubernetes_io_name"
        rule {
          source_labels = ["__meta_kubernetes_pod_label_app_kubernetes_io_name"]
          action = "replace"
          target_label = "app"
        }

        // Label creation -  "job" field from "__meta_kubernetes_namespace" and "__meta_kubernetes_pod_container_name"
        // Concatenate values __meta_kubernetes_namespace/__meta_kubernetes_pod_container_name
        rule {
          source_labels = ["__meta_kubernetes_namespace", "__meta_kubernetes_pod_container_name"]
          action = "replace"
          target_label = "job"
          separator = "/"
          replacement = "$1"
        }

        // Label creation - "container" field from "__meta_kubernetes_pod_uid" and "__meta_kubernetes_pod_container_name"
        // Concatenate values __meta_kubernetes_pod_uid/__meta_kubernetes_pod_container_name.log
        rule {
          source_labels = ["__meta_kubernetes_pod_uid", "__meta_kubernetes_pod_container_name"]
          action = "replace"
          target_label = "__path__"
          separator = "/"
          replacement = "/var/log/pods/*$1/*.log"
        }

        // Label creation -  "container_runtime" field from "__meta_kubernetes_pod_container_id"
        rule {
          source_labels = ["__meta_kubernetes_pod_container_id"]
          action = "replace"
          target_label = "container_runtime"
          regex = "^(\\S+):\\/\\/.+$"
          replacement = "$1"
        }
      }

      // loki.source.kubernetes tails logs from Kubernetes containers using the Kubernetes API.
      loki.source.kubernetes "pod_logs" {
        targets    = discovery.relabel.pod_logs.output
        forward_to = [loki.process.pod_logs.receiver]
      }

      // loki.process receives log entries from other Loki components, applies one or more processing stages,
      // and forwards the results to the list of receivers in the component’s arguments.
      loki.process "pod_logs" {
        stage.static_labels {
            values = {
              cluster = "testkube",
            }
        }
        forward_to = [loki.write.default.receiver]
      }

      // loki.source.kubernetes_events tails events from the Kubernetes API and converts them
      // into log lines to forward to other Loki components.
      loki.source.kubernetes_events "cluster_events" {
        job_name   = "integrations/kubernetes/eventhandler"
        log_format = "logfmt"
        forward_to = [
          loki.process.cluster_events.receiver,
        ]
      }

      // loki.process receives log entries from other loki components, applies one or more processing stages,
      // and forwards the results to the list of receivers in the component’s arguments.
      loki.process "cluster_events" {
        forward_to = [loki.write.default.receiver]

        stage.static_labels {
          values = {
            cluster = "testkube",
          }
        }

        stage.labels {
          values = {
            kubernetes_cluster_events = "job",
          }
        }
      }

I came across this issue and added

        selectors {
          role = "pod"
          field = "spec.nodeName=" + coalesce(env("HOSTNAME"), constants.hostname)
        }

to the discovery.kubernetes component, but this proposed fix to restrict log tailing to the Daemon running on the node did not solve it for me.

I could use some help troubleshooting this.

tonyswumac · July 26, 2024, 3:17pm

That sounds like a file limit issue with kubernetes and your container runtime, not Alloy related. I’d recommend looking at perhaps changing default rlimit through your container runtime’s configuration.

crazyelectronio · July 26, 2024, 5:31pm

I understand that increasing the fs.inotify.max_user_watches limit could fix this. The reason I mention it here is because it is just a small setup with a lot of very common container images (like cert-manager, longhorn, traefik, alloy, metallb, etc.) and they all show the same behaviour.

But, the good news is that it looks like it was an initial flood of log tailing for the first 6-8 hours. Currently, I don’t see it happening anymore.

Thanks for your support.

yosiasz · July 26, 2024, 7:59pm

the ingress is not configurable?

crazyelectronio · July 31, 2024, 10:00am

Sure, but I just started using Alloy and still run in ‘figuring things out’ mode…

wilfriedroset · August 2, 2024, 3:13pm

Kubernetes offer a way to change fs.inotify.max_user_watches.
See: Using sysctls in a Kubernetes Cluster | Kubernetes
Users could deploy a pod like so

apiVersion: v1
kind: Pod
metadata:
  name: sysctl-example
spec:
  securityContext:
    sysctls:
    - name: fs.inotify.max_user_watches
      value: "YOUR VALUE HERE"

There is a shortcut in the helm chart to do that, see .Values.global.podSecurityContext
As per:

amartincolby · August 29, 2024, 8:52pm

I assume that this problem would be on a per-node basis. By that I mean if you have six containers on one node, this problem would appear, but if you spread six containers over three nodes, that problem would be likely to go away?

amartincolby · September 4, 2024, 12:27am

I’m running Alloy locally and have configured my fs.inotify.max_user_watches to 2099999999 and I am still seeing this problem. Setting this crazy-high value does not even reduce the number of these messages. The frequency is the same.

UPDATE

I have found other people complaining of this vis-a-vis Microk8s. I am wondering if the problem lies there.

amartincolby · September 4, 2024, 5:17am

It was indeed Microk8s, or at least the version I was using, 1.27. I just ran a vanilla Kubernetes installation and the logging is gone and the high CPU usage is eliminated. Indeed, I didn’t even need to increase the watch limit anymore. The problem was my Kube.

lindhe · December 13, 2024, 1:24pm

I don’t think it’s possible to set fs.inotify.max_user_watches using securityContext.sysctls like you write.

From the documentation:

Only namespaced sysctls are configurable via the pod securityContext within Kubernetes.
[…]
Sysctls with no namespace are called node-level sysctls. If you need to set them, you must manually configure them on each node’s operating system, or by using a DaemonSet with privileged containers.

And since fs.inotify.* is not listed among the namespaced sysctls, it would require to be set some other way, as described.

jacob7395 · April 30, 2025, 5:18pm

I am running on fedora with kubeadm, I have large default limits but running this pod is causing my system to collapse. Not just for the pod but outside my pod I am getting;

Broadcast message from systemd-journald@servitor (Wed 2025-04-30 17:52:55 BST):

systemd[2020694]: Failed to allocate manager object: Too many open files

This might be to do with me using cri-o for my runtime, and users being shared across the preprocess.

The point stands though that running this service causes my server to have a sad time. I am pretty confidant I have configured my system to have 500k file limits for services and users. It still explodes and to the point where trying to --follow a k8s pod logs fails with a no file error.

So I don’t think this is specific to Microk8s, It feels like putting in some limit to how many files are opened at once would be reasonable. This is a long way off a production system, I would not expect to see this with the limited number of log files I have.

jacob7395 · May 1, 2025, 7:45pm

I think this was resolved by applying this on my server.

xalexander · June 21, 2025, 9:33pm

I’m running Rocky 9 with kubedadm. For anybody still running into this issue, what fixed it for me was raising fs.inotify.max_user_instances in addition to fs.inotify.max_user_watches

Topic		Replies	Views
Grafana Alloy/loki log ingestion Grafana Loki loki , alloy	21	1756	May 8, 2025
Grafana Alloy Memory issue Grafana Alloy	13	1460	March 21, 2025
Alloy unable to read Kubernetes cluster events Grafana Alloy kubernetes	3	311	June 11, 2025
Alloy on OpenShift: loki.source.file can’t see /var/log/pods logs Grafana Alloy plugins , observability , openshift , alloy	2	109	September 23, 2025
Alloy don't find targets of logfiles Grafana Alloy	10	2844	November 12, 2024

Using discovery.kubernetes Alloy pushes a lot of failed to create fsnotify watcher: too many open files

Related topics