I send logs from 100 VMs to Loki via Promtail. I also view these in Grafana. When I view the logs for the last month, the list of all VMs can be seen thanks to the tag I call “instance” in Promtail. However, as an example problem; Let’s say 2 VMs have not been able to send logs for 1 day. How do I know which one of these is?
If it were Prometheus, it would be possible to see which of the targets was up and which was down, since it was pulling. But what can I do here?
Couple of things to consider:
You can choose to approach this with meta monitoring (monitoring the monitoring). Make your promtail agents scrape targets, and you can see the status of the agents just as you would the status of a monitored instance.
You can plot a graph on count of logs from all VMs grouped by instance ID. Then from the graph you should be able to see which one of them is not sending logs (presumably count would be 0). You can also have legends sorted by last value.