When alerts are deployed to Loki, it produces errors with status=500, and alerts are completely inconsistent compared to what is returned manually in grafana.
"log": "level=info ts=2024-10-09T19:46:01.141151674Z caller=engine.go:248 component=ruler evaluation_mode=local org_id=fake msg=\"executing query\" type=instant query=\"(sum(count_over_time({namespace=\\\"ai\\\"} |= \\\"check completed\\\"[10m])) < 30)\" query_hash=3077565986"
"log": "level=info ts=2024-10-09T19:46:01.228229623Z caller=metrics.go:217 component=ruler evaluation_mode=local org_id=fake latency=fast query=\"(sum(count_over_time({namespace=\\\"ai\\\"} |= \\\"check completed\\\"[10m])) < 30)\" query_hash=3077565986 query_type=metric range_type=instant length=0s start_delta=694.286852ms end_delta=694.286992ms step=0s duration=87.000739ms status=500 limit=0 returned_lines=0 throughput=4.1GB total_bytes=357MB total_bytes_structured_metadata=715kB lines_per_second=847349 total_lines=73720 post_filter_lines=48 total_entries=1 store_chunks_download_time=0s queue_time=0s splits=0 shards=0 query_referenced_structured_metadata=false pipeline_wrapper_filtered_lines=0 chunk_refs_fetch_time=155.03µs cache_chunk_req=0 cache_chunk_hit=0 cache_chunk_bytes_stored=0 cache_chunk_bytes_fetched=0 cache_chunk_download_time=0s cache_index_req=0 cache_index_hit=0 cache_index_download_time=0s cache_stats_results_req=0 cache_stats_results_hit=0 cache_stats_results_download_time=0s cache_volume_results_req=0 cache_volume_results_hit=0 cache_volume_results_download_time=0s cache_result_req=0 cache_result_hit=0 cache_result_download_time=0s cache_result_query_length_served=0s ingester_chunk_refs=0 ingester_chunk_downloaded=0 ingester_chunk_matches=73 ingester_requests=8 ingester_chunk_head_bytes=2.5MB ingester_chunk_compressed_bytes=41MB ingester_chunk_decompressed_bytes=354MB ingester_post_filter_lines=48 congestion_control_latency=0s index_total_chunks=0 index_post_bloom_filter_chunks=0 index_bloom_filter_ratio=0.00 index_shard_resolver_duration=0s disable_pipeline_wrappers=false"
Running Loki Chart 6.16.0 in SSD, deploying alerts to Loki using GitHub - grafana/cortex-rules-action
groups:
- name: AiErrorAlerts
rules:
- alert: LowCheckCount
expr: sum(count_over_time({namespace="ai"} |= `check completed` [10m])) < 30
for: 10m
labels:
route: alerts-ai
severity: medium
source: loki
annotations:
summary: Low check completion rate.
details: Checks completed per 10 minutes - `{{ $value }}`. Expected rate - `30`.
Ruler config:
rulerConfig:
alertmanager_url: "https://<mimir_am>:8080/alertmanager/"
enable_alertmanager_v2: true
enable_sharding: true
remote_write:
enabled: true
clients:
local:
url: "http://haproxy.prometheus.svc.cluster.local:8080/api/v1/push"
storage:
gcs:
bucket_name: hostinger-loki-ruler
wal:
dir: /var/loki/wal
At this point, I am completely lost as to what could be causing this so any help is appreciated