Grafana Alerting (Spring-Boot Prometheus metrics) - Send an alert notification per new condition check

Hi,

I am trying to create alerts using Grafana for Spring Boot metrics scraped from Prometheus. The use-case is to alert for exceptions thrown from each service. I’m using the http_server_requests_seconds_count metric and mentioned below is the breakdown of the PromQL query I’m using to create the graphs.

  • First I’m excluding all the metrics which don’t throw an exception.

http_server_requests_seconds_count{application="my-service-1",exception!="None"}

  • Next I’ve applied the rate() function since the default metric just provides a monotonous value.

rate(http_server_requests_seconds_count{application="my-service-1",exception!="None"}[5m])

  • Then I’ve used the following condition to trigger an alert. (Using max() function as the sum() and the count() functions take the data-points into consideration, which is not my requirement)

WHEN max() OF query(A,5m,now) IS ABOVE 0.02
EVALUATE every 1m FOR 5m

The above setup works fine an sends a notification whenever the alert condition is met. However,I’m facing several problems with this approach.

  1. I need the actual count of exceptions instead of a rate

I’ve tried the following approach to solve this. But, it still gives a monotonous value until a new exception is thrown.

count_over_time(http_server_requests_seconds_count{application="my-service-1",exception!="None"}[5m])

  1. I’m getting a series for each exception and unless the alerting state has gone back to Ok , Grafana will not send a notification for a second time the condition is met from a different series.

I thought maybe if I can get a spike per exception for each series, and the graph stays at 0 for the rest of the time, I can solve this issue. So, I’ve tried reducing the time interval for the rate() function but, it seems like I can reduce it up until 1 minute only. Eventhough it resolves the problem a bit, whenever a second exception comes from another series in between that 1 minute, it won’t send a new notification.

rate(http_server_requests_seconds_count{application="my-service-1",exception!="None"}[1m])

WHEN max() OF query(A,1m,now) IS ABOVE 0.02
EVALUATE every 1m FOR 0m

How can I address the above issues and get Grafana to alert per new exception and also send the count instead of a rate?

(I’m using Grafana v6.5.3)

Appreciate you kind help!

Hi @ErandaEG

You should have a much better chance completing your goal with the new Unified Alerting in Grafana 8, which shipped today. It’s an entirely new alerting platform with much greater power and ease-of-use. And it includes much greater support for variables in alerts

Check it out:
docker run -p 3000:3000 --name=grafana -e "GF_FEATURE_TOGGLES_ENABLE=ngalert" grafana/grafana:8.0.0

1 Like

Hi @mattabrams! Thank you so much for replying! :blush:

Just a quick question. Seems like this feature is still not publicly available and we have to enable it by adding some changes to grafana.ini file.

  • [panels] enable_alpha; Set to true if you want to test alpha panels that are not yet ready for general usage. Default is false .

enable_alpha: true

  • Setting the ngalert feature toggle enables the new Grafana 8 Alerting system.

ngalert: true

However, I saw the following set of lines in the documentation for FeatureToggles interface (https://grafana.com/docs/grafana/latest/packages_api/data/featuretoggles/).


meta property

Signature

meta: boolean;

TypeScript

Remarks

Available only in Grafana Enterprise

ngalert property

Signature

ngalert: boolean;

Is the remark mentioned above belonging to ngalert property, which in turn means that the new alerting feature will be only available for Grafana Enterprise version?

I will get clarity on our temprary use of the FeatureToggle for Unified Alerting, but rest assured:

Unified Alerting is available in Grafana OSS

1 Like

I believe that the feature toggle is designed to keep the legacy alerting intact until the Unified Alerting platform is a little more battle-tested.

Thank you so much for clarifying @mattabrams! I will use the newest update and try to address my problem with it. :blush:

1 Like