When our production base cluster triggers alerts named “Datasource error,” the alert names vary each day, with common names like “datasource error” and specific alerts such as “LokiRequestErrors” and “AlloyPodCrashLooping.” Sometimes, it self-heals. During my research, I found that the issue might be related to Prometheus Thanos being unavailable, although no alert specifically indicates this. I need to identify why Thanos is sometimes not found. My assumption might be incorrect, so could someone help me troubleshoot this process? I will add my alert description below! in the alert I coudn’t find the label valuse as well!
FIRING:1 | DatasourceError |
1x Alerts Firing
Summary: Loki is experiencing request errors.
Description: [no value] [no value] in cluster [no value] is experiencing a high number of errors.
alertname: DatasourceError
grafana_folder: Observability Enablement
ref_id: A
rulename: LokiRequestErrors
sc_component: loki
sc_env: production
sc_provider: k8s
sc_system: oe
severity: P1