I have a swarm with multiple VMs which we monitor with Grafana.
One of the metrics we collect is the disk usage.
For disk usage we have already an alert when it exceeds 75%. Though due to the nature of the project we have some VMs with very big disk. For those VMs it is not convenient to have alert when disk usage exceeds 75% because there are still some TB left unused.
I modified the dashboard in Grafana to have two queries.
- Showing the used_percent of disk usage
SELECT mean(“used_percent”) FROM “disk” WHERE (“path” = ‘/opt/applications’) AND $timeFilter GROUP BY time($__interval), “host” fill(previous)
- Showing the free space in the disk (this is hidden, want to use it for the alert)
SELECT mean(“free”) FROM “disk” WHERE (“path” = ‘/opt/applications’) AND $timeFilter GROUP BY time($__interval), “host” fill(previous)
Then in alert i have an AND expression and i produce an alert only if disk used_percent is over 75% (as before) AND free disk space is lower than 1TB. So if i have a huge disk space alert wont fire if the disk usage percentage is just over 75% and we have more than 1TB free still.
This seems to work fine, but i have a problem with the email content of the alert notification.
The email content includes from the second part of the alert multiple VMs which have less than 1TB free disk space (the majority of them). I would expect that it would do a logical AND between the two alert conditions with host as a key and include in the mail only the measurement from the host that matches both alert conditions.
Is this possible and i miss something?
Thanks in advance