Hey brpaz,
Thanks for your question!
You are correct in that the only way to roll out log collection is to use a DaemonSet. An example manifest that configures logging can be found in the Grafana Agent GitHub repo. Note that promtail has been embedded into the Grafana Agent, so you have the option of using the Grafana Agent or promtail directly.
The same is true of node_exporter
. Since an exporter must run on each Node, it too must be rolled out as a DaemonSet. Once again, you can roll it out as a standalone DaemonSet, or use the node_exporter
integration hat’s been embedded into the Grafana Agent (and can use this same Agent for logging as well). An example configuration can be found here. Note that this is a Grafana Agent configuration file and not a K8s manifest.
Apologies for the confusion, we are centralizing K8s deployment around Agent Operator which should get formally released in the near future. It currently supports metrics. Logs, traces, and integrations coming soon. It should make configuring these components much simpler.
For your use case, I would recommend an Agent DaemonSet that enables the node_exporter
integration and log collection. If you wanted, you could enable metrics scraping of the kubelet
and cadvisor
endpoints in this DaemonSet as well, being sure to use the host filter config parameter.
If you want to scrape any other Pods in your cluster using this DaemonSet, be sure to use the role: pod
kubernetes_sd_config
with host_filter
enabled. If you want to scrape the API server or use other kubernetes_sd_config
types, you should use a separate Deployment for Agent to scrape metrics in your cluster (which the K8s integration does by default). We are also working on some improvements to Agent clustering/scraping service which will also simplify this a bit.
For the metrics usage issue, do you mind pointing me to where it mentions keeping usage under the free tier limit? This should be the case for smaller clusters that are not very heavily loaded. (We should clarify this wherever it says that). For medium-large clusters (4+ nodes) that are running significant workloads, your metric usage will scale with workloads as many of the metrics fan out across Nodes, containers, Pods, etc.
To learn how to control your metrics usage, we have some docs and tools that may be helpful, please see Control Prometheus metrics usage. Using these resources you can tune the defaults to bring your usage down while still keeping the metrics that are important for your use case.
Hope this helps!