Your alert rule will be firing one alert per endpoint per user per report that is exceeding 50 HTTP 500 errors in the last 5 minutes as that’s how multi-dimensional alerting works. You can use uni-dimensional alerts (classic conditions) but those exist to facilitate those upgrading from the old alerting from Grafana 7 and older.
lol no that is not what I asked. I want to customize my alert message to provide details from the logs or metrics that we are monitoring, aka the query results. I don’t plan to create a new alert for every end point. Especially for http errors like 500 or 404. I simply want to save time and effort for our engineers to explore the root cause, where possible.
In this example, if the application is down, it does not make sense to send out 1 alert for each end point. We want our query to be smart enough to able to figure out the root cause. Technically, what our engineers would do manually to find the root cause, we plan to automate using Grafana monitoring.
Let me rephrase my question. Lets forget everything else and focus on this for now. Does that help?
“I want to develop a alert template that allows us to shares details from my monitoring query in my alert notification.”
I think you’re confusing alerts and alert rules? You create just one alert rule, but this alert rule can fire multiple alerts at the same time.
“I want to develop a alert template that allows us to shares details from my monitoring query in my alert notification.”
This doesn’t happen in the template though. You need to get this information in your alerts first by adding data from your query to either labels or annotations. You then use notification templates to choose which labels or annotations to print in the notification.
This is what my understanding is. What I really want to know is how do we get this information in labels because that part is not clear in the documentation or may be I am having a hard time understanding it.
Could you please guide and help me learn?
Thank you
What is your datasource? Can you share your query?
My datasources are
Kafka SQL,
Azure Monitor and
New Relic
In the future we may also add SQL Server to our data source.
See this for KSQL
https://api.telemetry.confluent.cloud/docs#tag/Version-2/paths/~1v2~1metrics~1{dataset}~1descriptors~1metrics/get
Edit:
Forgot to add prometheus
I’m not super familiar with those datasources. Can you share a screenshot perhaps of the query you have right now in your alert rule? If you can also show a screenshot having clicked the Preview button then I can see exactly what your query is doing and whether it is creating labels or not.
If that’s not possible, I can share an example using Prometheus and then all you need to do is change the query for Kafka/Azure Monitor/New Relic.
blurred the instance details.
So this query should usually be zero, but we want to know
- if it changes from zero to one
- Which connector failed which is part of the query results above.
OK great! So just using the sum
function isn’t enough, you want to sum by
the other labels that you want. For example, the endpoint. Just using the sum function discards all the information that you want in your notification. There are some good examples of what that looks like here Querying examples | Prometheus.
Lets say I sum by connector name. So what will be the label I use? “Connector name” ?
Because the documentation does not clarify that part.
For example here the doc says
Return all time series with the metric http_requests_total and the given job and handler labels:
Which are prometheus labels, not grafana, so if I am using KSQL directly or using another data source like New Relic or Azure Monitor, how would it work?
In KSQL, New Relic, or Azure Monitor they will have their own version of labels. I don’t use New Relic or Azure Monitor, but it looks like New Relic calls them tags and Azure Monitor calls them namespaces/resources. You’ll need to refer to their documentation for that, but when queried in Grafana those will appear as labels.
So what will be the label I use? “Connector name” ?
You’ll need to look at your tags in New Relic, or namespaces/resources in Azure Monitor. Those will have names, and those will be the names of the labels in your queries.
Thank you. That makes sense.
I will look into that, experiment with it for the rest of the day and get back to you tomorrow
Appreciate your patience and responses.
Regards,