Currently using Loki 2.7.0. We have configured a rule like this:
groups:
- name: reverse deadman's alert
rules:
- alert: "Loki logs vs threshold."
annotations:
message: "Loki logs vs threshold."
tags: "{{ $labels.aws_account_alias }},{{ $labels.environment}}"
expr: 'sum by (aws_account_alias) (count_over_time({aws_account_alias="myaccount"} [5m]) )> 500'
for: 5m
If I run this query in Grafana, it produces one metric (which is what we want), that’s consistently above 500K (which is number of logs we have from this particular account). Judging by the metric, the alert should be fired and kept alive indefinitely. But that’s not the case, the alert gets fired, then it closes automatically roughly 10 minutes later.
Given alert manager just listens to API call, I am inclined to believe this behavior to be the result of ruler, and wonder if anyone can spot obvious misconfiguration from my end.
Ruler config:
ruler:
rule_path: /tmp/loki/rules-temp
alertmanager_url: {{ alertmanager_url }}
ring:
kvstore:
store: inmemory
enable_api: true
storage:
type: local
local:
directory: /etc/loki/rules