Connecting Alloy to self-hosted grafana and prometheus

I’m experimenting with a new grafana setup after retiring the old setup using telegraf. I’m doing OK but can’t figure out a couple of configuration tweaks.

I’ve set up the grafana server with prometheus. I installed node-exporter on the server and the node exporter full dashboard, and that’s looking great, self reporting itself in great detail.

I have some other servers I want to add to this setup with the same amount of data. However these servers are behind firewalls and instead of the grafana server ‘pulling’ data, these servers must push their data to the collecting server. I have read the way to do this is by using grafana alloy remote write mode. I have this partially working after a few hiccups with firewalls (as always), but there are a few niggles.

The first is a small one. The remote server adds itself to the dashboard with the Job name of “integrations/unix”. I’d like to be able to set this job name to the server name, or another identifier. Where do I do this? I figure if I add more servers in this manner, they’ll all have the jobname of integrations/unix, and I want to differentiate them.

The second problem is more concerning. A lot of the data that the locally running node-exporter service supplied, is not supplied by grafana alloy agent in remote write mode. eg
CPU Basic
Network Traffic Basic
CPU
Network Traffic
Disk IOPS
Memory VMstat
Some of the Process data
Systemd data
7 out of the 8 panels under Storage Disk although Storage Filesystem is reported fine
… etc

This seems like quite a lot of data that’s not getting reported. Some of it I can do without, but Disk IO is particularly important for me.

Here is my alloy exporter config.

logging {
  level = "warn"
}


prometheus.exporter.unix "default" {
  include_exporter_metrics = true
  disable_collectors       = ["mdadm","xfs","zfs"]

  filesystem {
    fs_types_exclude     = "^(devtmpfs|tmpfs|overlay|proc|squashfs|pstore|securityfs|sysfs|tracefs|udev)$"
    mount_points_exclude = "^/(proc|run|sys|dev/pts|dev/hugepages)($|/)"
  }

}

prometheus.scrape "default" {
  targets = concat(
    prometheus.exporter.unix.default.targets,
    [{
      // Self-collect metrics
      job         = "alloy",
      __address__ = "127.0.0.1:12345",
    }],
  )

  forward_to = [prometheus.remote_write.metrics_service.receiver]
}


prometheus.remote_write "metrics_service" {
    endpoint {
        url = "http://myhost.com:9990/api/v1/write"
        name = "remotehost.com"
        basic_auth {
           username = "xxxxxx"
           password = "yyyyyyyyyy"
        }
    }
}

I was playing with excluding filesystem data, but the config still returns the same (incomplete) data even with that stanza removed.

I would recommend you to use other labels such as app or hostname. Some of the components in Alloy can overwrite the job label. You can of course choose to overwrite them again within remote_write, but why work against the tool?

Looking at your config, your metrics may be missing because you are not forwarding them. Grafana Alloy doesn’t actually expose the metrics it’s gathering, meaning that if you scrape from alloy http endpoint the metrics are you getting is from alloy itself, not the host. What you want instead is to reference your prometheus.scrape object instead. This is what our alloy configuration looks like in the metric section:

prometheus.exporter.cadvisor "docker" {
  docker_host = "unix:///var/run/docker.sock"
}

prometheus.exporter.process "default" {}

prometheus.exporter.unix "default" {}

prometheus.scrape "default" {
  scrape_interval = "10s"
  targets         = concat(prometheus.exporter.cadvisor.docker.targets, prometheus.exporter.process.default.targets, prometheus.exporter.unix.default.targets)
  forward_to      = [prometheus.relabel.job_label.receiver]
}

// Add whatever label
prometheus.relabel "job_label" {
  forward_to = [prometheus.remote_write.mimir_receiver.receiver]

  rule {
    action       = "replace"
    replacement  = "host123.local"
    target_label = "hostname"
  }
}

//
// Currently Alloy agent doesn't have an universal endpoint to expose all prometheus scraping metrics.
// See https://github.com/grafana/alloy/issues/576#issuecomment-2159718786
//

prometheus.remote_write "mimir_receiver" {
  external_labels = default_labels._this.external_labels

  endpoint {
    url = "<URL>"
  }
}

Thanks for the reply. So if I understand what you’re saying, the problem area in my config is this

prometheus.scrape "default" {
  targets = concat(
    prometheus.exporter.unix.default.targets,
    [{
      // Self-collect metrics
      job         = "alloy",
      __address__ = "127.0.0.1:12345",
    }],
  )
}

So I tried a couple of other stanzas for targets. First of all I tried this

  targets = concat( prometheus.exporter.process.default.targets, prometheus.exporter.unix.default.targets)

Which caused all metrics to stop being collected. So then I added back the alloy stanza and used:

  targets = concat(
    prometheus.exporter.unix.default.targets,
    prometheus.exporter.process.default.targets,
    [{
      // Self-collect metrics
      job         = "alloy",
      __address__ = "127.0.0.1:12345",
    }],
  )

And that also stopped all metrics being collected. So I went back to the original, and the depleted metrics resumed.
I’m leaving the job name issue alone for now, so not yet experimenting with filters.

Try taking away the 127.0.0.1:12345 part. If you need to pipe alloy internal metrics as well you don’t actually need to scrape for it (not externally anyway). For example (from our own config):

prometheus.exporter.self "internal" {}

// Add an "app" label for self (alloy) metrics.
prometheus.scrape "internal" {
  scrape_interval = "10s"
  targets         = prometheus.exporter.self.internal.targets
  forward_to      = [prometheus.relabel.internal.receiver]
}

prometheus.relabel "internal" {
  forward_to = [prometheus.remote_write.mimir_receiver.receiver]

  rule {
    action       = "replace"
    replacement  = "grafana-alloy"
    target_label = "app"
  }

  rule {
    action       = "replace"
    replacement  = "hostname.local"
    target_label = "hostname"
  }
}

prometheus.exporter.process "default" {}

prometheus.exporter.unix "default" {}

prometheus.scrape "default" {
  scrape_interval = "10s"
  targets         = concat(prometheus.exporter.process.default.targets, prometheus.exporter.unix.default.targets)
  forward_to      = [prometheus.relabel.default_label.receiver]
}

prometheus.relabel "default_label" {
  forward_to = [prometheus.remote_write.mimir_receiver.receiver]

  rule {
    action       = "replace"
    replacement  = "hostname.local"
    target_label = "hostname"
  }
}

prometheus.remote_write "mimir_receiver" {
  endpoint {
    url = "<URL>"
  }
}

Yes, that’s exactly what I tried in my reply above. But taking out the alloy part from the scraper targets resulted in no data being scraped at all.
For now I’m not using a relabel stanza, so just forwarding directly from the scraper to the remote_write. Trying to keep it simple so I can see what’s happening. Are you saying the relabel stanza is a key to the whole thing working?

From your test above you took away prometheus.exporter.process.default.targets, i am saying try taking away 127.0.0.1:12345 from scraper.

And no, I dont believe relabel block is significant, but you can certainly try.

1 Like

I gave up in the end.

1 Like

I’m running in to the same issue.

I had previously installed node_exporter on a host, and displayed vast amount of metrics using the Node Exporter Grafana dashboard.

I have now installed Grafana Alloy on the host, and have removed node_exporter.

My dashboard is now missing metrics, that the previously installed node_exporter had no issues collecting and displaying (CPU Basic, Network Basic, and others…).

Here’s is my config.alloy:

// Sample config for Alloy.
//
// For a full configuration reference, see https://grafana.com/docs/alloy
logging {
  level = "warn"
}

prometheus.exporter.unix "default" {
  include_exporter_metrics = true
  enable_collectors = [
    "ethtool",
    "interrupts",
    "ksmd",
    "lnstat",
    "logind",
    "meminfo_numa",
    "mountstats",
    "network_route",
    "ntp",
    "perf",
    "processes",
    "qdisc",
    "runit",
    "softirqs",
    "supervisord",
    "sysctl",
    "systemd",
    "tcpstat",
  ]
}

prometheus.scrape "default" {
  targets = array.concat(
    prometheus.exporter.unix.default.targets,
    [{
      // Self-collect metrics
      job         = "alloy",
      __address__ = "127.0.0.1:12345",
    }],
  )

  forward_to = [
    prometheus.remote_write.default.receiver,
  ]
}

prometheus.remote_write "default" {
  endpoint {
    url = "http://selfhosted.prometheus:9090/api/v1/write"
  }
}

journalctl -u alloy.service which look to be related(?):

level=error msg="unable to parse PSI data: read /sys/fs/cgroup/system.slice/alloy.service/cpu.pr>
level=error msg="unable to parse PSI data: read /sys/fs/cgroup/system.slice/alloy.service/memory>
level=error msg="unable to parse PSI data: read /sys/fs/cgroup/system.slice/alloy.service/io.pre>

Any help would be greatly appreciated.

If you scrape from 127.0.0.1, you are scraping the metrics of the alloy agent itself, not whatever collector you want to scrape from. There are various examples in the Alloy configuration documentation, see prometheus.exporter.unix | Grafana Alloy documentation for one of them.

I thought this line in my config would tell Alloy to collect the metrics(?):

prometheus.exporter.unix.default.targets