Loki ruler sometimes does not raise alert

jsmorphiz · January 31, 2022, 1:35pm

We have seen Loki ruler sometimes does not raise a alert.
There are no error in logs.

Any pointers or settings we need to look into?

When I query from grafana Loki UI , I can get the results during that time window.
Alert Query used is similar to below

sum(
count_over_time(
{foo=“bar”}
|= “bazzError”
# extract the entire log line as a label
| regexp (?P<log>.+)
[4m]
)
) by (log)

dannykopping · January 31, 2022, 2:01pm

Hi @jsmorphiz

Can you please provide your alert definition as well?

jsmorphiz · January 31, 2022, 2:36pm

Please find the alert Definition.
I am able to get alerts however it is not consistent and sometimes alerts are missed, though I am able to view it Grafana Loki UI

groups:
  - name: rate-alerting
    rules:
      - alert: ServiceFailure
        expr: |
          sum(count_over_time({job="foo"} |~ "^.*status:.*Failure;" | regexp `(?P<log>.+)`| regexp `(?:custID:(?P<SomeID>[0-9]+);c)` [4m] )) by (SomeID,log)
        labels:
            severity: warning
            category: logs
        annotations:
            summary:  Service Failure for SomeID {{ $labels.SomeID }}
            description:  Service Failure for SomeID {{ $labels.SomeID }} , Log Message {{ $labels.log }}

dannykopping · January 31, 2022, 3:05pm

OK thanks.

I don’t see anything obviously wrong here.
Are you sure your Alertmanager instance is 100% available?
Also, what’s your evaluation_interval value set at?

jsmorphiz · January 31, 2022, 3:34pm

Hi @dannykopping ,

yes . AlertManager is 100% available plus I have set the log level to debug ,so I can see all alerts coming from Loki.
we have not set evaluation_interval, so it is using the default value i.e. 1m.

dannykopping · February 1, 2022, 7:03am

Are you sure the alert was not already raised? AFAIK the behaviour in Alertmanager is that it will not fire another notification if the same alert is received within a given period of time.

In any case, there are many moving parts here so that makes this pretty difficult to diagnose. If this is reproducible, I’d suggest removing AM from the equation and configure your AM URL to a service that will receive the requests (like https://requestbin.net/), and validate that the requests are successfully sent. If not, we can try dive deeper about why there appear to be gaps.

jsmorphiz · February 1, 2022, 5:16pm

yes I can confirm Alert Manager is not a issue (verified from logs), plus I can also see logs in Loki ruler (whenever it fetches results).It is logged with “Rule evaluation result discarded… <>”
We are on Loki 2.3.

dannykopping · February 1, 2022, 5:34pm

That’s an error originating from the underlying Prometheus code:

github.com

prometheus/prometheus/blob/main/rules/manager.go#L646-L653

      
        
            					case storage.ErrOutOfOrderSample:
            						numOutOfOrder++
            						level.Debug(g.logger).Log("msg", "Rule evaluation result discarded", "err", err, "sample", s)
            					case storage.ErrDuplicateSampleForTimestamp:
            						numDuplicates++
            						level.Debug(g.logger).Log("msg", "Rule evaluation result discarded", "err", err, "sample", s)
            					default:
            						level.Warn(g.logger).Log("msg", "Rule evaluation result discarded", "err", err, "sample", s)

Do you see any other errors starting with “Error on ingesting”?

jsmorphiz · February 3, 2022, 4:14pm

hi @dannykopping ,

I don’t see any errors with “Error on ingesting”.

One more thing I would like to add is Ruler and Ingestor are on different host. Not sure if this is correct.
Server 1(Query,frontend,ruler) , server2(ingestor,distributer)
Any other parameter or field in loki config which we need to look into ?

dannykopping · February 4, 2022, 8:07am

The ruler behaves like a querier, so it shouldn’t matter

system · February 4, 2023, 8:08am

This topic was automatically closed 365 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Loki ruler doesn't trigger Grafana Loki alerting	1	371	November 11, 2022
Alert on every log entry Grafana Loki alerting	7	4926	September 23, 2023
Loki alerting via ruler doesn't work Grafana Loki loki , configuration	3	1634	November 29, 2023
Alert disapears after 5 minutes Grafana Loki alerting	14	2588	May 24, 2023
Loki send alert when error message comes in log Grafana Loki alerting	9	10789	April 15, 2022

Loki ruler sometimes does not raise alert

Related topics