Json Parser for specific logs

Hey everyone hope you all are doing well I am in trouble can someone please help me :

  • we’re fetching logs from promtail into loki.

  • we’ve the following log entry from which we need to pull out status code, requestdatetime and requestresponsetime.

  • however, as the log line is huge we are unable to create a regex out of it for labeling purposes - so, we need an assistance/guidance w.r.t. how to parse the log to pull out needful data

  • Sample; below example we can work out - but as our line is huge - how can we create similar regex for it

  • 127.0.0.1 - - [01/Jan/2023:21:33:40 +0100] “GET /grafana_local/api/search?dashboardUIDs=alP6m1c4k&limit=30 HTTP/1.1” 200 525 “http://localhost/grafana_local/?orgId=1” “Mozilla/5.0 (X11; Linux x86_64; rv:108.0) Gecko/20100101 Firefox/108.0”

  • {job=“apache”} | pattern <ip> - - [<dttm>] “<method> <resource> <protocol>” <status> <object_size> “<referer>” “<user_agent>“

  • job=“apache”} | pattern <ip> - - [<_>] “<_> <_> <_>” <_> <_> “<_>” “<_>“

2023-05-29T12:46:57.437Z 0123abcd-xz12-9kjh INFO { type: ‘LOGS RESPONSE’, information: ‘{“channel”:“myapplication”,“principalId”:“abcd142s”,“statusCode”:200,“headers”:{“OperationName”:“GetDetails”,“MessageId”:“0123abcd-xz12-9kjh”,“RequestDateTime”:“2023-05-29T20:46:57.424”,“ResponseDateTime”:“2023-05-29T20:46:57.437”,“X-Frame-Options”:“deny”,“X-XSS-Protection”:“1; mode=block”,“X-Content-Type-Options”:“nosniff”,“Strict-Transport-Security”:“max-age=31536000 always; includeSubDomains”},“body”:“{\“result\”:[{\“number\”:\“123456\”,\“alias\”:\“alpha\”,\“age\”:\“10\”,\“channel\”:[\“myapplication\”],\“isLinked\”:false,\“position\”:0,\“Detail\”:\”\“,\“quickAction\”:[\”|SecondaryDefault|1\“,\”|SecondaryDefault|2\“,\”|SecondaryDefault|3\“,\”|SecondaryDefault|4\“,\”|SecondaryDefault|5\“,\”|SecondaryDefault|6\“,\”|SecondaryDefault|7\“]}]}”}’ }

I think your biggest problem is that your log line looks like JSON but it is not JSON (particularly the part { type: ‘LOGS RESPONSE’, information:), which means you can’t parse it like JSON, but you also can’t parse it using regex and pattern because of all the unnecessary single and double and escaped quotes.

In my opinion, the best way to approach this is to get rid of the non-JSON part, in this case get rid of the highest level type and information. Essentially, using your example, your log line would become:

2023-05-29T12:46:57.437Z 0123abcd-xz12-9kjh INFO {"channel":"myapplication","principalId":"abcd142s","statusCode":200,"headers":{"OperationName":"GetDetails","MessageId":"0123abcd-xz12-9kjh","RequestDateTime":"2023-05-29T20:46:57.424","ResponseDateTime":"2023-05-29T20:46:57.437","X-Frame-Options":"deny","X-XSS-Protection":"1; mode=block","X-Content-Type-Options":"nosniff","Strict-Transport-Security":"max-age=31536000 always; includeSubDomains"},"body":"{\"result\":[{\"number\":\"123456\",\"alias\":\"alpha\",\"age\":\"10\",\"channel\":[\"myapplication\"],\"isLinked\":false,\"position\":0,\"Detail\":\"\",\"quickAction\":[\"|SecondaryDefault|1\",\"|SecondaryDefault|2\",\"|SecondaryDefault|3\",\"|SecondaryDefault|4\",\"|SecondaryDefault|5\",\"|SecondaryDefault|6\",\"|SecondaryDefault|7\"]}]}"}

Then you can parse this relatively easily like so:

{SELECTOR} | regexp `\S* \S* \S* (?P<json_body>.*)` | line_format "{{.json_body}}" | json

Not sure what you currently do during log ingestion, might need to get creative.

Another option is to use Regex and force everything out except the JSON part under information. You can do this by using some sort of regex and capture everything between the 3rd single quote and last single quote. But this in my opinion is not as reliable.

Yes I will try this and update you if I need any help please help me out.

Not sure if it is an option for you but why not configure Apache to output JSON formatted logs? I have not tried this as I do not use Apache however a quick web search suggests that this is possible. We do it for Nginx.

Personally I always try to address log parsing as early as possible.

I am not using i am getting logs from cloudwatch using lambda promtail.