Questions about Loki, LogQL and JSON parsed fields

Apologies in advance if this is a super basic question, but I’m still new to the whole logging stack ecosystem and grappling with a lot of new vocabulary and concepts.

Here’s a summary of the environment I’ve setup so far:

  • Nginx configured so the access logs are JSON formatted with the details I care about, most notably user-agent strings.
  • Loki and Promtail (both 2.0.0) configured to index(?) those access logs
  • Grafana 7.3.1 with a data source pointing to the Loki instance

In the Explore view, I can run a basic query like {job="myjob"} and see all of the log entries for the selected time period. If I expand an entry, there’s a section for Log labels: and Parsed fields: and I’m trying to figure out how I filter on those parsed fields rather than just using regex against the whole line.

So something like this works to filter against the whole line:

{job="myjob"} |~ "searchvalue"

And I figured out that I can use the json parser expression(?) to turn the parsed fields into labels and query them like this:

{job="myjob"} | json | http_user_agent =~ `.*searchvalue.*`

But that feels wrong because it seems like I’m having to double parse the json. Is there a way to reference the “Parsed fields:” directly and filter on them? If so, is there a practical/performance difference between using regex on the whole line versus a parsed field?

Bonus question. Can I further parse the user-agent string such that I can query/report on specific elements within it? Everything hitting this web instance is a from a custom app sending customized user-agent strings rather than your typical messy browser user-agent and should have a consistent “key/value key/value key/value” style format.

2 Likes

What happens if you just remove the | json filter and try to use the label filter (the | http_user_agent ... bit) without json parsing? I am not exactly sure but in my setup the parsed fields are already filterable just fine without having to do extra | json-ing on logs already in JSON!

Can I further parse the user-agent string such that I can query/report on specific elements within it?

You could run a regex pipeline stage in your promtail config if the format is super static, or use a regexp expression in your LogQL! See the Parser Expression section here https://grafana.com/docs/loki/latest/logql/#Parser-Expression which explains the | regexp <re> syntax.

Removing the | json filter unfortunately just gives me no results anymore. In your setup, do you have promtail doing a json parsing step that turns the json fields into real labels first? I’m trying to avoid that because the Loki docs seem to advise against having too many dynamic labels.

I ended up figuring out the regex stuff, but thanks for the suggestion anyway. I ended up adding the regex processing in the promtail config and using the template functionality to re-write the json output that gets sent to Loki to include those regex processed fields as individual json fields. I still need to do the | json step on the Grafana side, but at least it’s less work after that.

1 Like