Failed to evaluate queries and expressions: alert rule panic; please check the logs for the full stack

Hi,

Since upgrade to 9.5.9 I am seeing a lot of these:

Failed to evaluate queries and expressions: alert rule panic; please check the logs for the full stack

Does anybody know what is the issue?

Regards

Tobias

More info

error="runtime error: index out of range [0] with length 0"

Hi! :wave: Would it be possible to share the full stack trace? This isn’t enough information to be able to help much I’m afraid.

logger=ngalert.eval rule_uid=cbe9a35f-7e29-48bc-b482-c780ba0b009d org_id=1 t=2023-09-13T09:13:11.056604532Z level=error msg="alert rule panic" error="runtime error: index out of range [0] with length 0" stack="goroutine 4708 [running]:\nruntime/debug.Stack()\n\truntime/debug/stack.go:24 +0x5e\ngithub.com/grafana/grafana/pkg/services/ngalert/eval.(*conditionEvaluator).EvaluateRaw.func1()\n\tgithub.com/grafana/grafana/pkg/services/ngalert/eval/eval.go:58 +0x7d\npanic({0xb2e480?, 0xc006b083a8?})\n\truntime/panic.go:914 +0x21f\ngithub.com/grafana/grafana/pkg/expr.(*DSNode).Execute(0xc002347760, {0x1633840, 0xc002192fc0}, {0xc007de8f68?, 0x33a51d3?, 0x61579e0?}, 0x7f12613c7518?, 0xc000b6b3b0)\n\tgithub.com/grafana/grafana/pkg/expr/nodes.go:317 +0x171b\ngithub.com/grafana/grafana/pkg/expr.(*DataPipeline).execute(0xc007de90f0, {0x1633840, 0xc002192fc0}, {0x34bb150?, 0xc002192fc0?, 0x61579e0?}, 0x61579e0?)\n\tgithub.com/grafana/grafana/pkg/expr/graph.go:53 +0xfe\ngithub.com/grafana/grafana/pkg/expr.(*Service).ExecutePipeline(0x1633798?, {0x1633840, 0xc002192fc0}, {0x60?, 0x58?, 0x61579e0?}, {0xc00b4de840, 0x3, 0x3})\n\tgithub.com/grafana/grafana/pkg/expr/service.go:68 +0xcf\ngithub.com/grafana/grafana/pkg/services/ngalert/eval.(*conditionEvaluator).EvaluateRaw(0xc00371aba0?, {0x1633798?, 0xc00ac7c240?}, {0x0?, 0x3?, 0x61579e0?})\n\tgithub.com/grafana/grafana/pkg/services/ngalert/eval/eval.go:74 +0x139\ngithub.com/grafana/grafana/pkg/services/ngalert/eval.(*conditionEvaluator).Evaluate(0xc00371aba0, {0x1633798?, 0xc00ac7c240?}, {0xc0094eeb00?, 0x5f8c908?, 0x61579e0?})\n\tgithub.com/grafana/grafana/pkg/services/ngalert/eval/eval.go:79 +0x51\ngithub.com/grafana/grafana/pkg/services/ngalert/schedule.(*schedule).ruleRoutine.func3({0x1633798, 0xc00ac7c240}, 0x202631d31b39047c?, 0x0?, 0xc00a9dbbc8, {0x1644700, 0xc0095a02c0})\n\tgithub.com/grafana/grafana/pkg/services/ngalert/schedule/schedule.go:386 +0x643\ngithub.com/grafana/grafana/pkg/services/ngalert/schedule.(*schedule).ruleRoutine.func5.2(0x48?)\n\tgithub.com/grafana/grafana/pkg/services/ngalert/schedule/schedule.go:519 +0xa8b\ngithub.com/grafana/grafana/pkg/services/ngalert/schedule.(*schedule).ruleRoutine.func4(0xc006578f50)\n\tgithub.com/grafana/grafana/pkg/services/ngalert/schedule/schedule.go:446 +0x59\ngithub.com/grafana/grafana/pkg/services/ngalert/schedule.(*schedule).ruleRoutine.func5(0x0?, 0xc0013add40, {0x7473657427203d20?, {0xc003fc4b10?, 0x5c72756f76616c66?}}, 0xc00a9dbbc8, 0xc005575e70, 0xc003fab050, {0x1644750, 0xc003b78330}, ...)\n\tgithub.com/grafana/grafana/pkg/services/ngalert/schedule/schedule.go:487 +0x1a6\ngithub.com/grafana/grafana/pkg/services/ngalert/schedule.(*schedule).ruleRoutine(0xc0013add40, {0x1633fb0, 0xc00374e0f0}, {0x22736e6f69746964?, {0xc003fc4b10?, 0x7b3a22726f746175?}}, 0xc00592c2a0, 0xc00592c300)\n\tgithub.com/grafana/grafana/pkg/services/ngalert/schedule/schedule.go:525 +0x7f6\ngithub.com/grafana/grafana/pkg/services/ngalert/schedule.(*schedule).processTick.func1()\n\tgithub.com/grafana/grafana/pkg/services/ngalert/schedule/schedule.go:261 +0x36\ngolang.org/x/sync/errgroup.(*Group).Go.func1()\n\tgolang.org/x/sync@v0.1.0/errgroup/errgroup.go:75 +0x56\ncreated by golang.org/x/sync/errgroup.(*Group).Go in goroutine 702\n\tgolang.org/x/sync@v0.1.0/errgroup/errgroup.go:72 +0x96\n"
logger=ngalert.scheduler rule_uid=cbe9a35f-7e29-48bc-b482-c780ba0b009d org_id=1 version=174 fingerprint=202631d31b39047c attempt=0 now=2023-09-13T09:13:00Z t=2023-09-13T09:13:11.056654513Z level=error msg="Failed to evaluate rule" err

Does that help?

Yes that’s perfect, thank you!

Just a theory, looks like InfluxDB queries that return no time series data result in the panic mentioned above. Seems this was OK before I upgraded or maybe somehow the upgrade triggered something. I am not sure why we have alert rules configured where the query does not return any values but I guess that is another question.

Did anything change in alerting that says that queries that return no time series data throw an error? I am setting these alerts to Error → OK now to silence them.

Worth mentioning is that we were on version 9.5.5 before and did not see this behavior.

This is a regression that appeared in 9.5.9 after the backport of one of the changes from the main branch and some code was missed because in 10.1 it was refactored.
It happens in 10.0.x when the feature flag disableSSEDataplane is enabled. This flag is disabled by default.
It doesn’t happen in 10.1.x and newer versions.

I opened a pull request to fix it in 10.0.x and then backport it to 9.5 SSE: Fix DSNode to not panic when response has empty response by yuri-tceretian · Pull Request #74866 · grafana/grafana · GitHub

Thanks for fixing this so quickly. We will deploy the version with the fix once it’s available then.