-
What Grafana version and what operating system are you using?
8.4
-
What are you trying to achieve?
Reduce minimum alert rule evaluation interval below 10s
-
How are you trying to achieve it?
Changed unified_alerting_min_interval to 1s
-
What happened?
Grafana will not boot, error: Failed to start grafana. error: value of setting 'min_interval' should be greater than the base interval (10s)
-
What did you expect to happen?
The minimum interval to decrease
Is it possible to evalute rules every <10s? I can’t seem to find what the ‘base interval’ is and how to modify it
Hi! I’m afraid It is not possible to reduce the minimum interval below 10 seconds. Do you want alert rules that are evaluated more frequently than every 10 seconds?
I ended up making a backend plugin that solved my issue, but yep originally the goal was to have certain alert rules evaluate every <10s.
Can you share how were you able to achieve this? Is the plugin open-sourced?
Also, I have another requirement. I am sending alerts in google hangout space.
I want the template/format to be customized. Is that possible in Grafana 10.x?
I’m in the same situation, we want to use the Prometheus / Blackbox probes on our API health endpoints and want to get alerted faster then 10s when the health check indicates the service is down.
This in a production environment, so with load balancing on the services to guarantee high availability.
10s is just to long to be notified.
How do you know that the service is down with just 10 seconds of data?
A production service should reply with a 200 OK at the /health endpoint.
3 failures need to be reported.
Assuming we take the 10s rule, this means only after 30s we get info ( not taking into account the extra ‘for xs’ rule in Grafana ).
This is too long for production.
I think this is the important bit. As you probably already know, there is a balance between false positives and time to alert. The lower the time to alert, the higher the chance of false positives (i.e. system is down when it actually isn’t down).
Prometheus, and Grafana - as it’s based on Prometheus, is much more opinionated at reducing false positives at the expense of time to alert.
I’m not sure if you will be able to have accurate alerts when the end to end time is less than 10 seconds. You will need to increase your sample rate of /health to something like once a second, set the evaluation interval and for interval to 10s and 0s, and then set group wait and group interval to 1s. However, I would expect to receive a lot of false positive alerts with such a set up.
@rujoesmith
Hello,
Kindly share details on the ‘backend plugin’ and how you were able to achieve alert rules evaluation to be less than 10 sec…