Count the number of logs for each distinct region (e.g., north, south, etc.).
Dynamically detect all region values without manually specifying them.
Display the counts in a table or panel for all regions at once.
What I’ve Tried:
The above query works for extracting region values but doesn’t group and count the logs for each distinct region.
I attempted count by (region) but couldn’t figure out how to dynamically detect and count distinct regions.
Desired Output: A table or panel showing:
Region Count
North 2
South 1
How can I achieve this dynamically for all region values detected in the logs?
{container_name=“check-count”}
|~ “region”:\s*"([^"]+)"
| regexp region":\s*“(?P[^”]+)
| line_format “{{.region}}
sum by (region) (cont_over_time({QUERY} [$__interval])
But it returned this error:
parse error at line 5, col 1: syntax error: unexpected SUM
Then I modified the query to:
{container_name=“check-count”}
|~ “region”:\s*"([^"]+)"
| regexp region":\s*“(?P[^”]+)
| line_format “{{.region}}
| sum by (region) (cont_over_time({QUERY} [$__interval])
And got this error:
parse error at line 5, col 3: syntax error: unexpected SUM
Finally, I tried the following query:
{container_name=“check-count”}
|~ "region":\s*\"([^\"]+)\"
| regexp region":\s*"(?P<region>[^"]+)
| unwrap region
| sum by (region) (count_over_time({container_name=“check-count”}[$__interval]))
This returned the error:
parse error at line 4, col 3: syntax error: unexpected unwrap
The challenge is with the region field. It’s a comma-separated string, and my current query counts the entire string as one entity. For example, north,east is counted as one occurrence instead of counting north and east separately.
What I want is to split the region values by the comma and then count each individual value. So the result should look something like this:
I don’t think there is an easy way to do this. Hopefully others have some creative ideas.
Easiest solution to this is perhaps to split your log in the source (or somewhere after the source). Essentially turning a log line such as [2024-12-19 00:04:27] {“region”: “north,east”, “host_id”: “a”} into:
If that’s not feasible, couple of workarounds I can think of, assuming you have a finite set of regions and you know what they are:
If you have a mimir or prometheus cluster, you can use recording rules and set metrics to your prometheus cluster, but you’ll probably need one rule for each of the possible value for region.
If you are visualizing from Grafana, you can create multiple queries under one graph, each query corresponding to one possible value for region. You should then end up with 4 queries (assuming n, e, s, and w), then you can use Grafana dashboard’s join feature (or other transformations) to join then together.
You can also try something rather unwieldy. For example, we consider this query:
| pattern `[<_>] <json_body>`
| line_format `{{ .json_body }}` # Get JSON
| json # Parse JSON
| line_format `{{ if contains "north" .region }}north=true{{ end }} {{ if contains "east" .region }}east=true{{ end }} {{ if contains "south" .region }}south=true{{ end }} {{ if contains "west" .region }}west=true{{ end }}`
| logfmt
What this essentially does is it turns your logs into <region>=true if a region exists, for example [2024-12-19 00:04:27] {“region”: “north,east”, “host_id”: “a”} becomes north=true east=true, and those get turned into labels. So with your example logs you’ll have this at the end:
Then you can aggregate by each of those regions which should give you want you want. I haven’t tried this, and there is a possibility that it won’t process multiple labels from the same line twice, if that’s the case then this won’t work.
sum by (north,east,south,west) (count_over_time(
<THE_QUERY>
))