We are grouping different logs based on app name, log, log level and metric key. And send unique error log to alert manager.
We have observed loki missed to send some error messages to alertmanager.
Lets consider: For a given time interval if we see 5 error message grouped on loki for different apps and different metric key, unexpectedly it sends 2 errors events further. (Expected to send all 5 )
Before sending alerts it fires below expression:
rules:
-alert: ErrorInLogs
expr: |
sum by (log,app,level,metric) (count_over_time({level="ERROR"} | regexp " (?P<log>.+)" [1m])) > 0
for: 0s
Let us know how can we resolve missing alert issue?