For many years I have been running promtail to gather logs and Prometheus to gather metrics. In any given cluster I need to run promtail on each node so that it can access the filesystem and gather data from the log files.
Conversely, in any given cluster I need to only run one instance of Prometheus, for these reasons
Prometheus can easily access the HTTP endpoints of all the services running in a cluster.
Multiple instances of Prometheus scraping the same targets means needing to do deduplication downstream.
Moving to Alloy means that I could collect metrics and logs with one tool instead of two. However, if I configure Alloy to run on every node, using the same configuration as before, I will have to deal with deduplication. One option is to have Alloy only gather metrics from the services running on the same node, instead of from all services in the cluster.
How are you doing logs and metrics collection in your system?
For every host we have alloy agents installed that collect logs and scrape metrics “locally” only to the host it’s running on, meaning linux metrics and such.
We have, separately, a cluster of alloy agents (size depends on environment) that is configured with cluster mode, and this is responsible for scraping from actual HTTP endpoints.
@tonyswumac, ok, that means you have a couple of different Alloy configurations running in your system. One definition for the local agents, then another definition for the cluster of agents that scrape HTTP endpoints. That is another option I can consider. Thank you for sharing that.
Does your Alloy cluster also scrape data from your local Alloy agents? Or are the local Alloy agents doing remote_write to log and metric storage destinations?
We are currently doing remote write from individual alloy agents, to a centralized alloy cluster, then to mimir. Primary reason we did it this way was because we used to use telegraf + influxdb, which is push based, and it was easier for us to migrate with push mechanism.
We did do a POC by scraping from all alloy agents, using EC2 discovery, and it worked quite well too. We just haven’t considered whether we want to switch.