First, I am new to Grafana so be patient. I have reviews many documents as well as videos but they seem to be using early versions of Grafana - mine is 10.3.1. That being said, I am trying to get an alert when my Proxmox VE goes over 20% for example.
The alert query I am using is:
from(bucket: “proxmoxve1”)
|> range(start: v.timeRangeStart, stop:v.timeRangeStop)
|> filter(fn: (r) =>
r._measurement == “cpustat” and
r._field == “cpu”
)
|> filter(fn: (r) => r[“host”] == “pve”)
|> aggregateWindow(every: v.windowPeriod, fn: mean)
Expression B is:
Expression C is:
Expression D is:
Output is:
I have changed to values many times and can’t seem to get the results I need.
Apologies, deleted my post as I used the wrong account
I didn’t test this but, but I’m thinking maybe the issue is your aggregateWindow setting? You have fn:mean set which could be averaging the values out over the time window.
Hmm, doesn’t look like any of the values you have seen were above 0.2 in either graph you provided…
I’m also curious why you have an “expression D” for math in your alert config.
Can you double check the threshold you’ve got configured?
I just did a little test, to demonstrate. My ESXi host is UP, so the ping value is returning “1”. I set the threshold to “is above 2” and it shows normal. When I set it to “is above 0” it shows the alert is firing.
You can adjust the threshold, and then click “Preview” and it will tell you the number of values that your alert would be “firing” for.
Please play around with this and find the right setting.
How often are you recording data points? Every second? Every minute?
Can you give the above in terms of a time window, e.g. “I am trying to get an alert when my average Proxmox VE value goes over 20% within a 15-second time window”?
As @mcbrineellis aluded to, your aggregateWindow statement should be changed to reflect your desired alert state, e.g.
Can you post a sample output (in tabular form) of the above query? That will help answer the question you asked about your output vs that of @mcbrineellis
In your alert, I think you can remove Expression D because it’s not being used anywhere in the alert.
What I am trying to get is an alert if the CPU value is over 20% as in this image:
If I right click on that panel and select a new alert it loads this query:
from(bucket: “proxmoxve1”)
|> range(start: v.timeRangeStart, stop:v.timeRangeStop)
|> filter(fn: (r) =>
r._measurement == “cpustat” and
r._field == “cpu”
)
|> filter(fn: (r) => r[“host”] == “pve”)
|> aggregateWindow(every: v.windowPeriod, fn: mean)
But when I preview the alert I see this:
Why don’t I see the values like shown in the first image?