Monitoring hundreds of linux hosts

Hi all,

I’m looking at solutions to gather OS/system metrics from hundreds of Linux hosts and feed them to grafana.

I’ve been playing around with the telegraf agent deployed on my hosts reporting into a site-local telegraf receiver which then batches up metrics and sends them on to the InfluxDB at a central site where I also run grafana. This seems to work ok.

Historically we have used SNMP to poll servers for information, the telegraf agent takes at least double the amount of RES memory and way more (3 or 4 times) VIRT memory.

Does anyone have any happy experience of polling hundreds of Linux nodes over SNMP and serving those metrics to grafana (telegraf/influxdb optional!)?

What tools do you guys use/recommend to extract and serve this type of data to grafana?

Thanks
Angus

You should try https://prometheus.io/ it is better at this sort of polling & metrics collection

Thanks, I will take a look!