Kubernetes Monitioring Configuration - No Data Issue

  • What Grafana version and what operating system are you using?
    Grafana Cloud - v9.0.5-d1f1041
    Grafana agent iamge - grafana/agent:v0.24.0

  • What are you trying to achieve?
    Configuring Grafana K8s Monitoring

  • How are you trying to achieve it?
    Followed the Agent Grafana Config instructions from the Grafana Kubernetes menu

  • What happened?
    Configured Grafana Agent using Terraform but no data is being displayed from the prebuilt Kubernetes Monitoring Dashboards.
    From the dashboard Cluster dropdown menu, the only option is None. However, if I manually type the name of my cluster (devops), I can see the Events(beta) logs temporarily.
    I see no error messages from the Grafana Agent pod logs, and the logs look fine but no metrics are collected

  • What did you expect to happen?
    To see all the metrics and logs being displayed from the prebuilt K8s monitoring dashboards

  • Can you copy/paste the configuration(s) that you are having problems with?
    Terraform is used for configuration. For agent stateful set, kubernetes_deployment is used with kubernetes_persistent_volume resource as we have an existing Grafana agent. So I’ve added the K8s monitoring config to the existing deploment resource.

resource "kubernetes_cluster_role" "grafana_agent" {
  metadata {
    name = "grafana-agent"
  }

  rule {
    api_groups = [""]
    resources  = ["nodes", "nodes/proxy", "services", "endpoints", "pods", "events"]
    verbs      = ["get", "list", "watch"]
  }

  rule {
    non_resource_urls = ["/metrics"]
    verbs             = ["get"]
  }
}

resource "kubernetes_cluster_role_binding" "grafana_agent" {
  metadata {
    name = "grafana-agent-binding"
  }
  role_ref {
    api_group = "rbac.authorization.k8s.io"
    kind      = "ClusterRole"
    name      = "grafana-agent"
  }
  subject {
    kind      = "ServiceAccount"
    name      = ${service_account}  # existing service account
    namespace = "${namespace}"
  }
}

resource "kubernetes_service" "grafana_agent" {
  metadata {
    name      = "grafana-agent-service"
    namespace = "${namespace}"
    labels = {
      name = "grafana-agent"
    }
  }

  spec {
    type             = "ClusterIP"
    session_affinity = "None"

    selector = {
      app  = "grafana-agent"
      name = "grafana-agent"

    }

    port {
      name        = "grafana-agent-http-metrics"
      port        = "80"
      target_port = "80"
    }
  }
}

resource "helm_release" "kube_state_metrics" {
  name        = "kube-state-metrics"
  chart       = "kube-state-metrics"
  description = "Kube State Metrics"
  version     = "4.13.0" # https://artifacthub.io/packages/helm/prometheus-community/kube-state-metrics
  repository  = "https://prometheus-community.github.io/helm-charts"
  namespace   = "${namespace}"
}
resource "kubernetes_deployment" "grafana_agent" {
  metadata {
    name      = "grafana-agent"
    namespace = "${namespace}"

    labels = {
      app = "grafana-agent"
    }
  }

  spec {
    replicas = 1

    selector {
      match_labels = {
        app  = "grafana-agent"
        name = "grafana-agent"
      }
    }

    template {
      metadata {
        labels = {
          app  = "grafana-agent"
          name = "grafana-agent"
        }
      }
      spec {
        volume {
          name = "grafana-data"
          persistent_volume_claim {
            claim_name = kubernetes_persistent_volume_claim.grafana_agent.metadata[0].name
          }
        }

        volume {
          name = "agent-wal"
          persistent_volume_claim {
            claim_name = kubernetes_persistent_volume_claim.agent_wal.metadata[0].name
          }
        }

        volume {
          name = "grafana-agent-config-volume"
          config_map {
            name = "grafana-agent-config"
          }
        }
        container {
          volume_mount {
            name       = "grafana-agent-config-volume"
            mount_path = "/etc/agent"
          }
          volume_mount {
            name       = "grafana-data"
            mount_path = "/etc/agent/data"
          }
          volume_mount {
            name       = "agent-wal"
            mount_path = "/var/lib/agent"
          }
          name    = "grafana-agent"
          image   = var.grafana_container # grafana/agent:v0.24.0
          command = ["/bin/agent"]

          env {
            name = "HOSTNAME"
            value_from {
              field_ref {
                field_path = "spec.nodeName"
              }
            }
          }

          port {
            container_port = 80
            name           = "http-metrics"
          }

          args = [
            "--config.file=/etc/agent/agent.yaml",
            "--enable-features=integrations-next",
            "--server.http.address=0.0.0.0:80"
          ]

        }
        service_account_name = "${service_account}"
      }
    }
  }
}

data "template_file" "grafana_config" {
  template = file("${path.cwd}/agent.yaml")
  vars = {
    api_key      = "${api_key}"
    logs_api_key = "${logs_api_key}"
  }
}

resource "kubernetes_config_map" "grafana_agent_config" {
  metadata {
    name      = "grafana-agent-config"
    namespace = "${namespace}"
  }

  data = {
    "agent.yaml" = data.template_file.grafana_config.rendered
  }
}

resource "google_compute_disk" "grafana_agent" {
  name = "grafana-data"
  type = "pd-standard"
  zone = "${var.region}-a"
  size = 2
}

resource "kubernetes_persistent_volume" "grafana_agent" {
  depends_on = [
    google_compute_disk.grafana_agent
  ]
  metadata {
    name = "grafana-data"
    annotations = {
      "storageclass.kubernetes.io/is-default-class" = "false"
    }
  }
  spec {
    capacity = {
      storage = "2Gi"
    }
    access_modes = ["ReadWriteMany"]
    persistent_volume_source {
      gce_persistent_disk {
        pd_name = google_compute_disk.grafana_agent.name
        fs_type = "ext4"
      }
    }
    storage_class_name = "grafana-data"
  }
}

resource "kubernetes_persistent_volume_claim" "grafana_agent" {
  depends_on = [
    kubernetes_persistent_volume.grafana_agent
  ]
  metadata {
    name      = "grafana-data"
    namespace = "${namespace}"
  }
  spec {
    access_modes = ["ReadWriteMany"]
    resources {
      requests = {
        storage = "2Gi"
      }
    }
    storage_class_name = "grafana-data"
    volume_name        = kubernetes_persistent_volume.grafana_agent.metadata.0.name
  }
}

resource "google_compute_disk" "agent_wal" {
  name = "agent-wal"
  type = "pd-standard"
  zone = "${var.region}-a"
  size = 2
}

resource "kubernetes_persistent_volume" "agent_wal" {
  depends_on = [
    google_compute_disk.agent_wal
  ]
  metadata {
    name = "agent-wal"
    annotations = {
      "storageclass.kubernetes.io/is-default-class" = "false"
    }
  }
  spec {
    capacity = {
      storage = "5Gi"
    }
    access_modes = ["ReadWriteOnce"]
    persistent_volume_source {
      gce_persistent_disk {
        pd_name = google_compute_disk.agent_wal.name
        fs_type = "ext4"
      }
    }
    storage_class_name = "agent-wal"
  }
}

resource "kubernetes_persistent_volume_claim" "agent_wal" {
  depends_on = [
    kubernetes_persistent_volume.agent_wal
  ]
  metadata {
    name      = "agent-wal"
    namespace = "${namespace}""
  }
  spec {
    access_modes = ["ReadWriteOnce"]
    resources {
      requests = {
        storage = "5Gi"
      }
    }
    storage_class_name = "agent-wal"
    volume_name        = kubernetes_persistent_volume.agent_wal.metadata.0.name
  }
}

agent.yaml

metrics:
  wal_directory: /var/lib/agent/wal
  global:
    scrape_interval: 60s
    external_labels:
      cluster: devops
  configs:
  - name: integrations
    remote_write:
      - url: https://prometheus-prod-01-eu-west-0.grafana.net/api/prom/push
        basic_auth:
          username: ${username}
          password: ${api_key}
integrations:
  eventhandler:
    cache_path: /var/lib/agent/eventhandler.cache
    logs_instance: integrations
logs:
  configs:
  - name: integrations
    clients:
    - url: https://logs-prod-eu-west-0.grafana.net/loki/api/v1/push
      basic_auth:
        username: ${username_logs}
        password: ${logs_api_key}
      external_labels:
        cluster: devops
        job: integrations/kubernetes/eventhandler
    positions:
      filename: /tmp/positions.yaml
    target_config:
      sync_period: 10s

Grafana Agent logs

ts=2022-07-22T13:54:05.983587609Z caller=eventhandler.go:185 level=info component=integrations integration=eventhandler instance=gke-devops-devops-node-pool-160469fc-8tvs:80 msg="Shipped entry" eventRV=411236 eventMsg="Scheduled for sync"
ts=2022-07-22T13:54:05.983832775Z caller=eventhandler.go:185 level=info component=integrations integration=eventhandler instance=gke-devops-devops-node-pool-160469fc-8tvs:80 msg="Shipped entry" eventRV=411237 eventMsg="Scheduled for sync"
ts=2022-07-22T13:54:13.709209572Z caller=eventhandler.go:185 level=info component=integrations integration=eventhandler instance=gke-devops-devops-node-pool-160469fc-8tvs:80 msg="Shipped entry" eventRV=411238 eventMsg="ConfigConnector is up to date"
ts=2022-07-22T13:54:14.745250392Z caller=eventhandler.go:305 level=info component=integrations integration=eventhandler instance=gke-devops-devops-node-pool-160469fc-8tvs:80 msg="Flushing last event to disk"
ts=2022-07-22T13:54:14.745971372Z caller=eventhandler.go:329 level=info component=integrations integration=eventhandler instance=gke-devops-devops-node-pool-160469fc-8tvs:80 msg="Flushed last event to disk"
ts=2022-07-22T13:54:24.744470171Z caller=eventhandler.go:305 level=info component=integrations integration=eventhandler instance=gke-devops-devops-node-pool-160469fc-8tvs:80 msg="Flushing last event to disk"
ts=2022-07-22T13:54:24.745144213Z caller=eventhandler.go:329 level=info component=integrations integration=eventhandler instance=gke-devops-devops-node-pool-160469fc-8tvs:80 msg="Flushed last event to disk"
ts=2022-07-22T13:54:34.744949177Z caller=eventhandler.go:305 level=info component=integrations integration=eventhandler instance=gke-devops-devops-node-pool-160469fc-8tvs:80 msg="Flushing last event to disk"
ts=2022-07-22T13:54:34.745565863Z caller=eventhandler.go:329 level=info component=integrations integration=eventhandler instance=gke-devops-devops-node-pool-160469fc-8tvs:80 msg="Flushed last event to disk"
ts=2022-07-22T13:54:44.744412515Z caller=eventhandler.go:305 level=info component=integrations integration=eventhandler instance=gke-devops-devops-node-pool-160469fc-8tvs:80 msg="Flushing last event to disk"
ts=2022-07-22T13:54:44.745194398Z caller=eventhandler.go:329 level=info component=integrations integration=eventhandler instance=gke-devops-devops-node-pool-160469fc-8tvs:80 msg="Flushed last event to disk"
ts=2022-07-22T13:54:54.744941093Z caller=eventhandler.go:305 level=info component=integrations integration=eventhandler instance=gke-devops-devops-node-pool-160469fc-8tvs:80 msg="Flushing last event to disk"
ts=2022-07-22T13:54:54.745566055Z caller=eventhandler.go:329 level=info component=integrations integration=eventhandler instance=gke-devops-devops-node-pool-160469fc-8tvs:80 msg="Flushed last event to disk"
ts=2022-07-22T13:55:04.744140744Z caller=eventhandler.go:305 level=info component=integrations integration=eventhandler instance=gke-devops-devops-node-pool-160469fc-8tvs:80 msg="Flushing last event to disk"
ts=2022-07-22T13:55:04.744722138Z caller=eventhandler.go:329 level=info component=integrations integration=eventhandler instance=gke-devops-devops-node-pool-160469fc-8tvs:80 msg="Flushed last event to disk"
ts=2022-07-22T13:55:14.744969963Z caller=eventhandler.go:305 level=info component=integrations integration=eventhandler instance=gke-devops-devops-node-pool-160469fc-8tvs:80 msg="Flushing last event to disk"
ts=2022-07-22T13:55:14.745765097Z caller=eventhandler.go:329 level=info component=integrations integration=eventhandler instance=gke-devops-devops-node-pool-160469fc-8tvs:80 msg="Flushed last event to disk"
ts=2022-07-22T13:55:24.744526605Z caller=eventhandler.go:305 level=info component=integrations integration=eventhandler instance=gke-devops-devops-node-pool-160469fc-8tvs:80 msg="Flushing last event to disk"
ts=2022-07-22T13:55:24.744984155Z caller=eventhandler.go:329 level=info component=integrations integration=eventhandler instance=gke-devops-devops-node-pool-160469fc-8tvs:80 msg="Flushed last event to disk"
ts=2022-07-22T13:55:34.745084381Z caller=eventhandler.go:305 level=info component=integrations integration=eventhandler instance=gke-devops-devops-node-pool-160469fc-8tvs:80 msg="Flushing last event to disk"
ts=2022-07-22T13:55:34.745632815Z caller=eventhandler.go:329 level=info component=integrations integration=eventhandler instance=gke-devops-devops-node-pool-160469fc-8tvs:80 msg="Flushed last event to disk"
ts=2022-07-22T13:55:41.133393668Z caller=eventhandler.go:185 level=info component=integrations integration=eventhandler instance=gke-devops-devops-node-pool-160469fc-8tvs:80 msg="Shipped entry" eventRV=411240 eventMsg="ConfigConnector is up to date"
ts=2022-07-22T13:55:43.625819867Z caller=eventhandler.go:185 level=info component=integrations integration=eventhandler instance=gke-devops-devops-node-pool-160469fc-8tvs:80 msg="Shipped entry" eventRV=411241 eventMsg="ConfigConnector is up to date"
  • Did you receive any errors in the Grafana UI or in related logs? If so, please tell us exactly what they were.
    There are no errors but no data is being displayed. I do not even see the cluster name from the prebuilt dashboard.

  • Did you follow any online instructions? If so, what is the URL?
    I followed instructions on the Grafana Cloud but also read a lot of official Grafana documentations including - Kubernetes Events (beta) | Grafana Cloud documentation and many other pages

Resolved by adding cadvisor, kubelet, and kube-state-metrics config to agent.yaml - Grafana Agent Metrics Kubernetes quickstart | Grafana Cloud documentation