Hi,
Since the latest release of Grafana (7.5) the Loki datasource allows alerting, previously you could bypass this restriction by making a Prometheus datasource that pointed to Loki and set alerts on that. Now that’s not necessary anymore, which is great but I’m running into an issue regarding certificates.
This is pretty weird because for all my datasources I set the CACert and it worked perfectly for my PromLoki and Prometheus datasources. When I looked into the logs I noticed that alerts were send in different ways. In the next logs URL is removed for security reasons, but you can see that requests are processed differently between the datasource types.
Prometheus Loki query requests and alert
t=2021-04-13T07:45:19+0000 lvl=info msg=“Proxying incoming request” logger=data-proxy-log userid=13 orgid=1 username=[user] datasource=prometheus uri=“/api/datasources/proxy/4/api/v1/query_range?query=count_over_time(%7Bapp%3D%22app-test%22%7D%7C~%22(INFO%7CERROR)%22%5B5m%5D)&start=1618278315&end=1618299915&step=15” method=GET body=
t=2021-04-13T07:49:01+0000 lvl=info msg=“New state change” logger=alerting.resultHandler ruleId=6 newState=pending prev state=no_data
t=2021-04-13T07:49:01+0000 lvl=info msg=“Alert already updated” logger=alerting.resultHandler
t=2021-04-13T07:52:01+0000 lvl=info msg=“New state change” logger=alerting.resultHandler ruleId=6 newState=ok prev state=pending
Loki query requests and alerts
t=2021-04-13T07:57:55+0000 lvl=info msg=“Proxying incoming request” logger=data-proxy-log userid=13 orgid=1 username=[user] datasource=loki uri=“/api/datasources/proxy/1/loki/api/v1/query_range?direction=BACKWARD&limit=1470&query=count_over_time(%7Bapp%3D%22app-test%22%7D%7C~%22(INFO%7CERROR)%22%5B5m%5D)&start=1618279075000000000&end=1618300676000000000&step=15” method=GET body=
2021/04/13 07:01:01 https://[url]/loki/api/v1/query_range?direction=BACKWARD&end=1618297261129128621&interval=1&limit=1000&query=count_over_time%28%7Bapp%3D%22app-test%22%7D%7C~%22%28INFO%7CERROR%29%22%5B5m%5D%29&start=1618296961129128621&step=1
t=2021-04-13T07:00:01+0000 lvl=eror msg=“Alert Rule Result Error” logger=alerting.evalContext ruleId=6 name=“Loki alert” error=“tsdb.HandleRequest() error Get "https://[url]/loki/api/v1/query_range?direction=BACKWARD&end=1618297201154147598&interval=1&limit=1000&query=count_over_time%28%7Bapp%3D%22app-test%22%7D%7C~%22%28INFO%7CERROR%29%22%5B5m%5D%29&start=1618296901154147598&step=1": x509: certificate signed by unknown authority” changing state to=alerting
As seen above the Loki requests are actually send through the /api/datasources/ endpoint but when the alert tries to trigger it does not use this endpoint but tries the full URL instead. Our Grafana runs in a docker image and we’ve not added the specific certificate to the /etc/ssl/certs directory.
Is this an implementation fault/change that is also going to happen to the Prometheus alerts?