I am trying to write a Loki LogQL query that returns the log lines matching common error strings and extracting the offending log line. Then grouping the results by the cluster and app, so that Grafana would not spam me with many alerts for each matching log line.
My current query returns one result for each matched log line:
sum(count_over_time({app =~ "(app.*)",cluster="contoso1"} |~ "(?i)Exception|Error" | pattern
[1m])) by (container,cluster,logSnippet)
The results from the query are like this, and I will get alerted by Grafana 3 times:
Cluster=contoso1, app=app1, logSnippet="Exception found:"
Cluster=contoso1, app=app1, logSnippet="Error 503"
Cluster=contoso1, app=app1, logSnippet="Server responded with an error."
I would like to group them and combine the individual logSnippets into one label that I can use in my Grafana alert template.
This is what I would like to achieve:
Cluster=contoso1, app=app1, logSnippet="Exception found: \n Error 503 \n Server responded with an error."
If I attempt to put my query inside of another sum() by (cluster, app) then I just lose the logSnippet labels…
Is there any solution to this?
Thanks!