Just switched to Grafana Alloy and did the setup for Kubernetes log collection as shown in the documentation. The logs (or at least most of it?) get collected and forwarded to Loki, but all containers constantly log failed to create fsnotify watcher: too many open files
, including the Alloy containers.
It’s a small 6-node cluster with less than 80 pods running on it.
I installed Alloy using the latest Helm chart and the following config:
alloy:
configMap:
# -- Create a new ConfigMap for the config file.
create: true
# -- Content to assign to the new ConfigMap. This is passed into `tpl` allowing for templating from values.
content: |-
logging {
level = "info"
format = "logfmt"
}
loki.write "default" {
endpoint {
url = "http://loki-svc.logging.svc.cluster.local:3100/loki/api/v1/push"
}
}
// local.file_match discovers files on the local filesystem using glob patterns and the doublestar library. It returns an array of file paths.
local.file_match "node_logs" {
path_targets = [{
// Monitor syslog to scrape node-logs
__path__ = "/var/log/syslog",
job = "node/syslog",
node_name = env("HOSTNAME"),
cluster = "testkube",
}]
}
// loki.source.file reads log entries from files and forwards them to other loki.* components.
// You can specify multiple loki.source.file components by giving them different labels.
loki.source.file "node_logs" {
targets = local.file_match.node_logs.targets
forward_to = [loki.write.default.receiver]
}
// discovery.kubernetes allows you to find scrape targets from Kubernetes resources.
// It watches cluster state and ensures targets are continually synced with what is currently running in your cluster.
discovery.kubernetes "pod" {
role = "pod"
selectors {
role = "pod"
field = "spec.nodeName=" + coalesce(env("HOSTNAME"), constants.hostname)
}
}
// discovery.relabel rewrites the label set of the input targets by applying one or more relabeling rules.
// If no rules are defined, then the input targets are exported as-is.
discovery.relabel "pod_logs" {
targets = discovery.kubernetes.pod.targets
// Label creation - "namespace" field from "__meta_kubernetes_namespace"
rule {
source_labels = ["__meta_kubernetes_namespace"]
action = "replace"
target_label = "namespace"
}
// Label creation - "pod" field from "__meta_kubernetes_pod_name"
rule {
source_labels = ["__meta_kubernetes_pod_name"]
action = "replace"
target_label = "pod"
}
// Label creation - "container" field from "__meta_kubernetes_pod_container_name"
rule {
source_labels = ["__meta_kubernetes_pod_container_name"]
action = "replace"
target_label = "container"
}
// Label creation - "app" field from "__meta_kubernetes_pod_label_app_kubernetes_io_name"
rule {
source_labels = ["__meta_kubernetes_pod_label_app_kubernetes_io_name"]
action = "replace"
target_label = "app"
}
// Label creation - "job" field from "__meta_kubernetes_namespace" and "__meta_kubernetes_pod_container_name"
// Concatenate values __meta_kubernetes_namespace/__meta_kubernetes_pod_container_name
rule {
source_labels = ["__meta_kubernetes_namespace", "__meta_kubernetes_pod_container_name"]
action = "replace"
target_label = "job"
separator = "/"
replacement = "$1"
}
// Label creation - "container" field from "__meta_kubernetes_pod_uid" and "__meta_kubernetes_pod_container_name"
// Concatenate values __meta_kubernetes_pod_uid/__meta_kubernetes_pod_container_name.log
rule {
source_labels = ["__meta_kubernetes_pod_uid", "__meta_kubernetes_pod_container_name"]
action = "replace"
target_label = "__path__"
separator = "/"
replacement = "/var/log/pods/*$1/*.log"
}
// Label creation - "container_runtime" field from "__meta_kubernetes_pod_container_id"
rule {
source_labels = ["__meta_kubernetes_pod_container_id"]
action = "replace"
target_label = "container_runtime"
regex = "^(\\S+):\\/\\/.+$"
replacement = "$1"
}
}
// loki.source.kubernetes tails logs from Kubernetes containers using the Kubernetes API.
loki.source.kubernetes "pod_logs" {
targets = discovery.relabel.pod_logs.output
forward_to = [loki.process.pod_logs.receiver]
}
// loki.process receives log entries from other Loki components, applies one or more processing stages,
// and forwards the results to the list of receivers in the component’s arguments.
loki.process "pod_logs" {
stage.static_labels {
values = {
cluster = "testkube",
}
}
forward_to = [loki.write.default.receiver]
}
// loki.source.kubernetes_events tails events from the Kubernetes API and converts them
// into log lines to forward to other Loki components.
loki.source.kubernetes_events "cluster_events" {
job_name = "integrations/kubernetes/eventhandler"
log_format = "logfmt"
forward_to = [
loki.process.cluster_events.receiver,
]
}
// loki.process receives log entries from other loki components, applies one or more processing stages,
// and forwards the results to the list of receivers in the component’s arguments.
loki.process "cluster_events" {
forward_to = [loki.write.default.receiver]
stage.static_labels {
values = {
cluster = "testkube",
}
}
stage.labels {
values = {
kubernetes_cluster_events = "job",
}
}
}
I came across this issue and added
selectors {
role = "pod"
field = "spec.nodeName=" + coalesce(env("HOSTNAME"), constants.hostname)
}
to the discovery.kubernetes
component, but this proposed fix to restrict log tailing to the Daemon running on the node did not solve it for me.
I could use some help troubleshooting this.