Filtering haproxy logs using pattern instead of regex

There are two types of Haproxy log lines. TCP and HTTP.
May 30 10:16:41 zv2096-abc nbid-12345[3046181]: 2601:1418:2101::1638:a7db:57784 [30/May/2024:10:16:40.960] listener12345 backend12345/node12345678 1/0/315 2989 – 9/9/8/0/0 0/0
May 30 10:16:41 zv2104-def nbid-45678[156515]: 136.252.25.169:1033 [30/May/2024:10:16:34.441] listener56789~ backend456789/node98876543 0/0/0/1/6616 200 1087673 - - ---- 26/26/22/14/0 0/0 “GET /blahblahblabload/Stasdfasdfer/1masdfIf.zip HTTP/1.1”

This is a TCP regex
(?P<date_time>\w+ \d+ \S+) (?P<nil>\S+) nbid-(?P<nbid>\d+)\[(?P<pid>\d+)\]: (?P<client_ip>\S+):(?P<client_port>\d+) \[(?P<request_date>\S+)\] (?P<frontend_name>\S+) (?P<backend_name>\S+)/(?P<server_name>\S+) (?P<Tw>\d+)/(?P<Tc>\d+)/(?P<Tt>\d+) (?P<bytes_read>\S+) (?P<termination_state>\S+) (?P<actconn>\d+)/(?P<feconn>\d+)/(?P<beconn>\d+)/(?P<srv_conn>\d+)/(?P<retries>\d+) (?P<srv_queue>\d+)/(?P<backend_queue>\d+)*$

This is an HTTP regex
(?P<date_time>\w+ \d+ \S+) (?P<nil>\S+) nbid-(?P<nbid>\d+)\[(?P<pid>\d+)\]: (?P<client_ip>\S+):(?P<client_port>\d+) \[(?P<request_date>\S+)\] (?P<frontend_name>\S+) (?P<backend_name>\S+)/(?P<server_name>\S+) (?P<TR>\d+)/(?P<Tw>\d+)/(?P<Tc>\d+)/(?P<Tr>\d+)/(?P<Ta>\d+) (?P<status_code>\S+) (?P<bytes_read>\S+) *(?P<request_cookie>\S+) (?P<response_cookie>\S+) (?P<termination_state>\S+) (?P<actconn>\d+)/(?P<feconn>\d+)/(?P<beconn>\d+)/(?P<srv_conn>\d+)/(?P<retries>\d+) (?P<srv_queue>\d+)/(?P<backend_queue>\d+) "(?P<method>\S+) (?P<url_path>[^"]+) (?P<version>\S+)" *$

Instead of using regex, from what I read, patterns are the preferred and recommended way of parsing log lines. Here is the pattern I use for TCP lines.
<_> <_> <_> <nil> nbid-<nbid>[<pid>]: <client_ip>:<client_port> [<request_date>] <frontend_name> <backend_name>/<server_name> <Tw>/<Tc>/<Tt> <bytes_read> <termination_state> <actconn>/<feconn>/<beconn>/<srv_conn>/<retries> <srv_queue>/<backend_queue> <_>

It works well for TCP lines but it keeps parsing HTTP lines and I get unexpected values for some of the variables.
Is there a way to rewrite the patter such that it omits HTTP log lines and filter them out?
Thank you

From a quick test it seems to match your HTTP example just fine, perhaps except the part - - ---- 26.

Since your TCP and HTTP logs are slightly different, you should hopefully have them in different log files, and when injecting logs into Loki you should have filename as one of the labels. Then you can use the filenames to determine whether it’s HTTP or TCP logs, then adjust your pattern string accordingly.

Thank you Tony, good to know that I can use file name as a label. I will give it a try.

Another challenge here… Is there a way for me to sum up the values of two labels from the same log line and then apply the range function etc… I would like to sum up for example values of the following labels

/(?P<Tc>\d+)/(?P<Tr>\d+)/

This topic was automatically closed 365 days after the last reply. New replies are no longer allowed.