Issue triage schedules for Alerting issues?

kritikmeister · September 17, 2024, 11:53am

Hello guys!
I’m very sorry for posting a question about not functionality of Grafana itself, but of processes around github issues.

We’re a company that recently migrated from Legacy to Unified Alerting, and facing some issues with both frontend & core logic.
Some of the issues we’re considering critical, and I see confirmations in Team’s triaging docs.

For example, this doc states that a bug that causes a data loss should be considered critical. #93337 is such a bug, doc states that it should be “someone’s top priority right now”, but it had no reaction for 4 days.
#92974 broke our alert logic massively, this waits for 12 days already.

I wonder if I might fill something wrong, or some automation may have failed to label issues accordingly, - so decided to start a topic like this.

If all I need to do is wait, could you please tell if there’s any ETAs on reaction, just for me to know when to worry about my issue missing out completely?

Thank you!

antonio · September 19, 2024, 10:21am

Hello @kritikmeister

We sincerely apologize for the inconvenience and frustration this issue has caused. We understand how critical this matter is, especially when it leads to data loss.

Up until recently, our process for triaging issues was done manually on a best-effort basis, which regrettably led to backlogs and delays. This was the reason for the delay in addressing your issue.

However, we’ve made significant improvements to our workflow. We’ve recently implemented an automated system that now assigns issues directly to the appropriate engineering teams, eliminating the backlog. I can confirm that both issues you referenced have already been triaged by this new automation. The Alerting team should get back to you as soon as business allows.

As a side note, we acknowledge that the triage documentation you referred to may need revision to accurately reflect our new automated process. We will work on updating it to provide clearer information on how we handle issues moving forward.

Regarding #93337 , I could reproduce the issue as described, and will reach out to the team internally to ensure it’s addressed promptly.

Thank you for your patience

Repro notes for reviewer:

After editing an existing alert rule , created a new evaluation group, and saved and exited the rule, I could see that Grafana silently failed to save the new evaluation interval . See the interval (in blue) is always 1m, regardless I chose 5m and 10m for 2 of the 3 alert rules.

kritikmeister · September 20, 2024, 7:44am

Hi @antonio,
Thank you so much!

Topic		Replies	Views
Announcement: Automated triaging for issues in the grafana/grafana repository Grafana announcements , github	3	1515	October 25, 2024
Alert triaging - call for users to appy to meet with the product team Alerting	0	164	June 12, 2023
Duplicated alert issue Grafana	2	2514	April 13, 2022
Grafana v8.1.2 (103f8fa094) Alert Always Pending Alerting unified-alerting	11	5877	January 24, 2023
Alerting in Grafana 8 Alerting alerting	9	3078	July 11, 2022

Issue triage schedules for Alerting issues?

Related topics