I’m having trouble getting the right data when using Grafana Managed Alerts on OSS 8.3.6. I’m pretty new to this tool so I’m probably missing something.
I have an alert set up for high CPU, over 90% for 5 minutes, and I have a couple of servers that go into the firing state each night at the same time.
When I receive the email for the Firing state, there’s no data about the alert. But when I receive the email for the Resolved state, there’s data in the Value field.
I haven’t quite figured out the templating for labels and annotations, but I was hoping just the default email template would give me enough information about the alert to know what was happening.
welcome to the forum, @bzink
Grafana’s transition from legacy alerting to the Unified Alerting platform represents a big step forward. But, there are many factors that can influence behavior, and it is often hard for the community to troubleshoot issues without a thorough understanding of your unique setup. Try to include the following info:
- What is your Grafana version?
- Are you using Grafana Cloud or self-hosted Grafana?
- Are you using legacy alerting or Unified Alerting?
- was the alert in question migrated from the legacy platform into Unified Alerting, or did you first create it inside the new platform?
- Please list ALL configuration options related to alerting. You can find these in the
Unified Alerting sections of Grafana’s config file. If you are now using or have previously used the beta version of
ngalert (released with Grafana 8), please note that too.
- you can use this table to better understand how configuration options can interact with each other
- If this is a templating issue on Unified Alerting, check if your alert is using a multi-dimensional rule or not.
- List the datasource associated with the alert
- Increase the verbosity of the Grafana server logs to
debug and note any errors. For printing to console, set the console logs to
debug as well.
- Search for open issues on GitHub that sound similar to your problem
@bzink Did you find a solution? I have the same issue when data source is either Cloudwatch (Using for AWS) or Azure monitor.
I have messed around with increasing the group wait time for the notification policy and that seemed to help for a while, but it started back up after a few days. I don’t see much in any of the logs on the server.
@mviggia1 No I haven’t found a solution for this. I’ve just been dealing with it, I haven’t had a ton of time to dig into it.