Top Queries & Top Processes (Windows) - Best way to approach

I’ve been working on various monitoring improvements using grafana and influx.
I’m wondering if anyone has tackled something that provides detail on “top queries” or something more than a simple time series metric? This is one area I haven’t been able to figure out a good solution too, but would resolve a big gap between grafana as a monitoring tool for me compared to some commercial equivalents.

I know how to get data pushed via polling perfmon/and various other ways, but not sure of the way you might suggest approaching this, if it is even possible.

This also sort of relates to “top processes” on a server at particular moment. I can capture these via powershell and push, but not sure of the best schema design to do this. Any tips are welcome!

Hi,

I’ve had some experience in monitoring Windows machines in the past. We ended up with using Telegraf (as you may know if you’d worked with influx) as collector of metrics and the Prometheus output plugin. With prometheus you can use an aggregation operator called topk which you can use for top queries.

There’s support for Windows performance counters out of the box with Telegraf and with that you can gather process metrics. However, we had some issues with it regarding multiple processes with the same name, see issue. We ended up writing a powershell script instead to gather process metrics (CPU and memory) and let Telegraf execute that at regular intervals.

I have not that much experience with Influx, but it seems like it supports top and bottom functions as well.

Hopefully this have given you some information to take your solution to o working one. Please let me know if you have questions or concerns.

Marcus