Using discovery.kubernetes Alloy pushes a lot of failed to create fsnotify watcher: too many open files

Just switched to Grafana Alloy and did the setup for Kubernetes log collection as shown in the documentation. The logs (or at least most of it?) get collected and forwarded to Loki, but all containers constantly log failed to create fsnotify watcher: too many open files, including the Alloy containers.
It’s a small 6-node cluster with less than 80 pods running on it.

I installed Alloy using the latest Helm chart and the following config:

alloy:
  configMap:
    # -- Create a new ConfigMap for the config file.
    create: true
    # -- Content to assign to the new ConfigMap.  This is passed into `tpl` allowing for templating from values.
    content: |-
      logging {
        level = "info"
        format = "logfmt"
      }

      loki.write "default" {
        endpoint {
          url = "http://loki-svc.logging.svc.cluster.local:3100/loki/api/v1/push"
        }
      }

      // local.file_match discovers files on the local filesystem using glob patterns and the doublestar library. It returns an array of file paths.
      local.file_match "node_logs" {
        path_targets = [{
            // Monitor syslog to scrape node-logs
            __path__  = "/var/log/syslog",
            job       = "node/syslog",
            node_name = env("HOSTNAME"),
            cluster   = "testkube",
        }]
      }

      // loki.source.file reads log entries from files and forwards them to other loki.* components.
      // You can specify multiple loki.source.file components by giving them different labels.
      loki.source.file "node_logs" {
        targets    = local.file_match.node_logs.targets
        forward_to = [loki.write.default.receiver]
      }

      // discovery.kubernetes allows you to find scrape targets from Kubernetes resources.
      // It watches cluster state and ensures targets are continually synced with what is currently running in your cluster.
      discovery.kubernetes "pod" {
        role = "pod"
        selectors {
          role = "pod"
          field = "spec.nodeName=" + coalesce(env("HOSTNAME"), constants.hostname)
        }
      }

      // discovery.relabel rewrites the label set of the input targets by applying one or more relabeling rules.
      // If no rules are defined, then the input targets are exported as-is.
      discovery.relabel "pod_logs" {
        targets = discovery.kubernetes.pod.targets

        // Label creation - "namespace" field from "__meta_kubernetes_namespace"
        rule {
          source_labels = ["__meta_kubernetes_namespace"]
          action = "replace"
          target_label = "namespace"
        }

        // Label creation - "pod" field from "__meta_kubernetes_pod_name"
        rule {
          source_labels = ["__meta_kubernetes_pod_name"]
          action = "replace"
          target_label = "pod"
        }

        // Label creation - "container" field from "__meta_kubernetes_pod_container_name"
        rule {
          source_labels = ["__meta_kubernetes_pod_container_name"]
          action = "replace"
          target_label = "container"
        }

        // Label creation -  "app" field from "__meta_kubernetes_pod_label_app_kubernetes_io_name"
        rule {
          source_labels = ["__meta_kubernetes_pod_label_app_kubernetes_io_name"]
          action = "replace"
          target_label = "app"
        }

        // Label creation -  "job" field from "__meta_kubernetes_namespace" and "__meta_kubernetes_pod_container_name"
        // Concatenate values __meta_kubernetes_namespace/__meta_kubernetes_pod_container_name
        rule {
          source_labels = ["__meta_kubernetes_namespace", "__meta_kubernetes_pod_container_name"]
          action = "replace"
          target_label = "job"
          separator = "/"
          replacement = "$1"
        }

        // Label creation - "container" field from "__meta_kubernetes_pod_uid" and "__meta_kubernetes_pod_container_name"
        // Concatenate values __meta_kubernetes_pod_uid/__meta_kubernetes_pod_container_name.log
        rule {
          source_labels = ["__meta_kubernetes_pod_uid", "__meta_kubernetes_pod_container_name"]
          action = "replace"
          target_label = "__path__"
          separator = "/"
          replacement = "/var/log/pods/*$1/*.log"
        }

        // Label creation -  "container_runtime" field from "__meta_kubernetes_pod_container_id"
        rule {
          source_labels = ["__meta_kubernetes_pod_container_id"]
          action = "replace"
          target_label = "container_runtime"
          regex = "^(\\S+):\\/\\/.+$"
          replacement = "$1"
        }
      }

      // loki.source.kubernetes tails logs from Kubernetes containers using the Kubernetes API.
      loki.source.kubernetes "pod_logs" {
        targets    = discovery.relabel.pod_logs.output
        forward_to = [loki.process.pod_logs.receiver]
      }

      // loki.process receives log entries from other Loki components, applies one or more processing stages,
      // and forwards the results to the list of receivers in the component’s arguments.
      loki.process "pod_logs" {
        stage.static_labels {
            values = {
              cluster = "testkube",
            }
        }
        forward_to = [loki.write.default.receiver]
      }

      // loki.source.kubernetes_events tails events from the Kubernetes API and converts them
      // into log lines to forward to other Loki components.
      loki.source.kubernetes_events "cluster_events" {
        job_name   = "integrations/kubernetes/eventhandler"
        log_format = "logfmt"
        forward_to = [
          loki.process.cluster_events.receiver,
        ]
      }

      // loki.process receives log entries from other loki components, applies one or more processing stages,
      // and forwards the results to the list of receivers in the component’s arguments.
      loki.process "cluster_events" {
        forward_to = [loki.write.default.receiver]

        stage.static_labels {
          values = {
            cluster = "testkube",
          }
        }

        stage.labels {
          values = {
            kubernetes_cluster_events = "job",
          }
        }
      }

I came across this issue and added

        selectors {
          role = "pod"
          field = "spec.nodeName=" + coalesce(env("HOSTNAME"), constants.hostname)
        }

to the discovery.kubernetes component, but this proposed fix to restrict log tailing to the Daemon running on the node did not solve it for me.

I could use some help troubleshooting this.

That sounds like a file limit issue with kubernetes and your container runtime, not Alloy related. I’d recommend looking at perhaps changing default rlimit through your container runtime’s configuration.

I understand that increasing the fs.inotify.max_user_watches limit could fix this. The reason I mention it here is because it is just a small setup with a lot of very common container images (like cert-manager, longhorn, traefik, alloy, metallb, etc.) and they all show the same behaviour.

But, the good news is that it looks like it was an initial flood of log tailing for the first 6-8 hours. Currently, I don’t see it happening anymore.

Thanks for your support.

the ingress is not configurable?

Sure, but I just started using Alloy and still run in ‘figuring things out’ mode…

1 Like

Kubernetes offer a way to change fs.inotify.max_user_watches.
See: Using sysctls in a Kubernetes Cluster | Kubernetes
Users could deploy a pod like so

apiVersion: v1
kind: Pod
metadata:
  name: sysctl-example
spec:
  securityContext:
    sysctls:
    - name: fs.inotify.max_user_watches
      value: "YOUR VALUE HERE"

There is a shortcut in the helm chart to do that, see .Values.global.podSecurityContext
As per:

1 Like

I assume that this problem would be on a per-node basis. By that I mean if you have six containers on one node, this problem would appear, but if you spread six containers over three nodes, that problem would be likely to go away?

I’m running Alloy locally and have configured my fs.inotify.max_user_watches to 2099999999 and I am still seeing this problem. Setting this crazy-high value does not even reduce the number of these messages. The frequency is the same.

UPDATE

I have found other people complaining of this vis-a-vis Microk8s. I am wondering if the problem lies there.

It was indeed Microk8s, or at least the version I was using, 1.27. I just ran a vanilla Kubernetes installation and the logging is gone and the high CPU usage is eliminated. Indeed, I didn’t even need to increase the watch limit anymore. The problem was my Kube.