Deal with non-standard nginx logs in loki/promtail


I’m not sure what the best scenario for dealing with non-standard (if I may call them so) logs in Nginx, such as those without an actual http request or a request which doesn’t contain the http method, request uri and or the http version. For example: - - [30/Dec/2023:07:00:06 +0000] "" 400 0 "-" "-" "-" - - [30/Dec/2023:06:26:20 +0000] "\x16\x03\x01\x00\xEE\x01\x00\x00\xEA\x03\x03\xD0\xCF\x9D/\xBE[\xEE\xC8\x9AG,\xCB\x00\x00\x8C\x05Qw\xE2VI,\xC9Y\x9A~\xB3F1\x8B>\xEA \x14ne\xD4\x9AZ\xEEp\xBC/8\xAA\x0Fw\x1C\xFC\xA3\xAE\x83\x96\xEFC\xD4\xEBT\x9By~\x12\x07\x5CF\x00&\xC0+\xC0/\xC0,\xC00\xCC\xA9\xCC\xA8\xC0\x09\xC0\x13\xC0" 400 157 "-" "-" "-" - - [30/Dec/2023:06:26:20 +0000] "\x16\x03\x01\x00\xCA\x01\x00\x00\xC6\x03\x03\x18j\xA5/\xB3w\xAA\xDD@\xC1\xB4er\xEF\xEE\x09W\x9D\xB8\xE5\xEFS\xE9\x8C\xD6\xDB4\xED,\xDB\x91\x8E\x00\x00h\xCC\x14\xCC\x13\xC0/\xC0+\xC00\xC0,\xC0\x11\xC0\x07\xC0'\xC0#\xC0\x13\xC0\x09\xC0(\xC0$\xC0\x14\xC0" 400 157 "-" "-" "-"

As you can see, this is different from a traditional http request such as: - - [30/Dec/2023:10:40:06 +0000] "GET /login HTTP/1.1" 200 35153 "" "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/78.0.3904.108 Safari/537.36" "-"

where the request is split into three parts (method/uri/http version).

For this I’m using the following regex in promtail:

^(?P<host>[\w\.]+) - (?P<user>[^ ]*) \[(?P<ts>.*)\] "(?P<method>[^ ]*) (?P<request_url>[^ ]*) (?P<request_http_protocol>[^ ]*)" (?P<status>[\d]+) (?P<bytes_out>[\d]+) "(?P<http_referer>[^"]*)" "(?P<user_agent>[^"]*)"?

Which successfully matches the last line, but it’s got issues with the first lines.
How do you normally treat these cases? On the internet I’m only seeing solutions that simply ignores them, but I don’t think that’s really useful, especially when they become abusive and you want to act upon it.

I guess that depends on what you want to do with the information.

If you are creating a graph aggregating number of requests based on method / uri / HTTP version, then it makes sense to ignore errors when something doesn’t parse, because even if they do parse they would have empty fields anyway.

So if you are looking to determine how severe the empty log lines are, perhaps you can specifically match for log lines that don’t come with uri information (match for two double quotes), then match for log lines that do have uri information (match for double quote then .+ then double quote), and compare the count between the two, and create alerts if necessary.

The most important part would be (at least for now) the http codes. For instance, counting the lines which have a status of >= 400, which I know how to do.

I guess I can add a panel with this information without having to know what the request looks like.

On the other hand, I also want to see the logs themselves by status code.

Now that I think about it, I suppose you’re right and it probably doesn’t make a lot of sense to handle both cases (the three split strings vs a random one). On the other hand, what if I wanted to count requests/order logs by http method? I wouldn’t be able to get that if I just use ".+" for the request field.

At the moment this is what I’ve come up with that matches both cases (3 split words vs random request):

'^(?P<host>\d{1,3}(?:\.\d{1,3}){3}) - (?P<user>[^ ]*) \[(?P<ts>[^\]]+)\] "(?:(?P<method>[A-Z]+) (?P<request_url>[^\s]+) (?P<request_http_protocol>[^\s"]+)|(?P<request_random>.*?))" (?P<status>\d{3}) (?P<bytes_out>\d+) "(?P<http_referer>[^"]*)" "(?P<user_agent>[^"]*)"(\s+"(?P<http_x_forwarded_for>[^"]*)")?

Of course this is transitory, it works, and it helps me to understand promtail/loki better in the meantime.

Maybe using ".+" (I think "[^"]*" might be more efficient, but I’m not 100% sure) for the request field in promtail might be better after all and when I need something more specific (http method, request, http version), I add the logic in loki? Would that make sense?

By all means if you are looking for status code and URI then you’d want to match for three words. What I meant was that if you are also looking to see the difference between logs with URI and without URI information then you can compare the difference between matching for empty string and non-empty string.

hello everyone.

i see you know how and where exactly to add the regex to break the nginx logs into fields to access them in loki query.

i have following logs in my nginx container: - - [16/Mar/2024:05:45:21 +0000] "GET /media/system/js/table-columns.min.js?8dc1188dfffb40a81af4102305fe0d623c977746 HTTP/2.0" 200 1303 "" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10.15; rv:123.0) Gecko/20100101 Firefox/123.0" 136 0.008 [default-joomla-service-80] [] 1303 0.008 200 ea2a20422e04ce4d4fc7a74317c2f29d - - [16/Mar/2024:05:45:22 +0000] "GET /media/templates/administrator/atum/images/logos/brand-large.svg HTTP/2.0" 200 3344 "" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10.15; rv:123.0) Gecko/20100101 Firefox/123.0" 90 0.003 [default-joomla-service-80] [] 3344 0.003 200 189c0ec93e058785cf98b5b9a5d68e24 - - [16/Mar/2024:05:45:22 +0000] "GET /media/templates/administrator/atum/images/select-bg.svg HTTP/2.0" 304 0 "" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10.15; rv:123.0) Gecko/20100101 Firefox/123.0" 178 0.002 [default-joomla-service-80] [] 0 0.002 304 1585003c4761f39390f0d55f7af0154f

and want to extract all possible fields ( for now the primary fields needed are the response codes and response time).

my loki/promtail was deployed using helm on my kube cluster.

can someone please help me to let me know where exactly i need to add the regex to add fileds/lables into my loki so that i can query these and build dashboards?