Alerts not being sent to Alertmanager

I have set up a ruler in Loki 2.3.0 and can see the rule being applied, but the alarm does not seem to be sent to Alertmanager. I can see “Get - deadline exceeded” in the log. What should the log look like if the alert is successfully sent to Alertmanager?

ts=2021-09-08T17:45:37.563238254Z caller=metrics.go:92 org_id=fake traceID=0cbff1adf7072e33 latency=fast query="{namespace=\"snip"} |= \"220397739\"
er range_type=range length=1h0m1s step=1s duration=48.616908ms status=200 limit=1000 returned_lines=8 throughput=3.9MB total_bytes=190kB
level=debug

ts=2021-09-08T17:45:37.563417673Z caller=logging.go:66 traceID=0cbff1adf7072e33 msg="GET /loki/api/v1/query_range?direction=BACKWARD&limit=1000&query=%7Bnamespacesnip&start=1631119537000000000&end=1631123138000000000&step=1 (200) 49.154623ms"
level=debug

ts=2021-09-08T17:45:39.424481812Z caller=mock.go:149 msg="Get - deadline exceeded" key=collectors/ring

Hi @omyno

Can you please provide your Loki config?
This looks to be an issue communicating with the ring; I don’t think this has anything to do with sending to Alertmanager yet.

Hi @dannykopping and thank you very much for your reply.

This is my configuration (installed via official Helm chart in Kubernetes). Is it necessary to use a different kvstore storage backend than inmemory?

extraArgs:
  log.level: debug
config:
  ruler:
    storage:
      type: local
      local:
        directory: /data/.ruler/loki/rules
    rule_path: /data/.ruler/loki/rules-tmp
    alertmanager_url: http://rancher-monitoring-alertmanager.cattle-monitoring-system.svc:9093
    ring:
      kvstore:
        store: inmemory
    enable_api: true
    enable_alertmanager_v2: true
persistence:
  enabled: true
  accessModes:
  - ReadWriteOnce
  size: 10Gi
  existingClaim: pvc-loki
  mountPath: "/data"
resources: {}
securityContext:
  fsGroup: 10001
  runAsGroup: 10001
  runAsNonRoot: true
  runAsUser: 10001
readOnlyRootFilesystem: false

Here’s the alert rule:
/data/.ruler/loki/rules/fake/rules.yml:

groups:
  - name: snip
    rules: 
      - alert: HighLogRate
        annotations: 
          message: "App is throwing too many errors per minute"
        expr: 'sum(rate({namespace="snip"}[1m])) > 1'
        for: 1m
        labels: 
          severity: warning
          namespace: snip

Hhmm, I’m not certain about that.

I wonder why/how it’s timing out. Maybe it’s a red herring?

First things first: let’s try changing the alert expression to something you are 100% certain will succeed (like 1+1). See if that results in a call to the AM and what log messages are produced.

So the first rule evaluation result was discarded which seems to be expected, but then I actually got the Alertmanager alarm for 1+1. This is a big step forward, thank you very much, @dannykopping!

I send pod logs from another Kubernetes cluster to Fluentd which look fine in Grafana itself.

I will further research why this expression does not work in a ruler. It works as expected in Grafana.

Glad that helped!

Just make sure that the ruler has the same storage_config so that it can query your logs.
The ruler itself is basically a querier with rule evaluation bolted on.

This topic was automatically closed 365 days after the last reply. New replies are no longer allowed.