Hi,
I’ve come across an interesting issue with labels in Promtail and their transmission to Loki. It seems that if a label exceeds 1024 characters, errors occur during the transmission to Loki, potentially leading to data loss.
I would like to understand how I can limit the maximum label size to 1024 characters to avoid these inconveniences. I’ve checked online resources and the official Grafana documentation, but I haven’t found specific information on how to set this limit. To address this problem, I attempted to use the following regular expression in the configuration: '(?P<html>(?:.|\n){0,1024})
. However, I encountered an error:
level=error ts=2023-07-28T08:56:56.33794899Z caller=main.go:170 msg="error creating promtail" error="failed to make file target manager: invalid regex stage config: could not compile regular expression: error parsing regexp: invalid repeat count: `{0,1020}`
I was wondering if any of you have encountered a similar problem in the past or have any ideas on how we can address this situation.
Thanks in advance.
Hey @mirawara
I think let’s take a step back. What are you trying to use as a label value here? A 1KB label sounds like potentially not a good fit, because there’s a potential for high cardinality which will impact query performance.
Thanks for your answer. I do this because the ModSecurity log is heavy, complex and dynamic and extracting labels with regex expressions from Promtail helps me to write simpler queries to Loki for the Grafana dashboard. Do you think it is better to extract the information using ‘pattern’ in the query instead of having the labels ready?
Could you give me an example of one of the labels you’re trying to store?
If it’s highly dynamic, it’s probably not best suited for a label.
See this blogpost for some general guidance about labels:
For example, the label ‘modsec_info’ contains something like this:
ModSecurity: Warning. Matched "Operator `PmFromFile' with parameter `scanners-user-agents.data' against variable `REQUEST_HEADERS:User-Agent' (Value: `Mozilla/5.0 (compatible; Nmap Scripting Engine; https://nmap.org/book/nse.html)' ) [file "/etc/nginx/modsec/coreruleset-3.3.4/rules/REQUEST-913-SCANNER-DETECTION.conf"] [line "34"] [id "913100"] [rev ""] [msg "Found User-Agent associated with security scanner"] [data "Matched Data: nmap scripting engine found within REQUEST_HEADERS:User-Agent: mozilla/5.0 (compatible; nmap scripting engine; https://nmap.org/book/nse.html)"] [severity "2"] [ver "OWASP_CRS/3.3.4"] [maturity "0"] [accuracy "0"] [tag "application-multi"] [tag "language-multi"] [tag "platform-multi"] [tag "attack-reputation-scanner"] [tag "paranoia-level/1"] [tag "OWASP_CRS"] [tag "capec/1000/118/224/541/310"] [tag "PCI/6.5.10"] [hostname "192.168.1.3"] [uri "/"] [unique_id "16893282798.722473"] [ref "o25,21v31,79t:lowercase"]
ModSecurity: Warning. Matched "Operator `Ge' with parameter `5' against variable `TX:ANOMALY_SCORE' (Value: `5' ) [file "/etc/nginx/modsec/coreruleset-3.3.4/rules/REQUEST-949-BLOCKING-EVALUATION.conf"] [line "81"] [id "949110"] [rev ""] [msg "Inbound Anomaly Score Exceeded (Total Score: 5)"] [data ""] [severity "2"] [ver "OWASP_CRS/3.3.4"] [maturity "0"] [accuracy "0"] [tag "application-multi"] [tag "language-multi"] [tag "platform-multi"] [tag "attack-generic"] [hostname "192.168.1.3"] [uri "/"] [unique_id "16893282798.722473"] [ref ""]
The problem is that there can be multiple parts starting with ‘Modesecurity:’, depending on how many rules the request triggered.
I don’t know if there is a way to limit the number of character for labels in promtail, but you can increase the number of character allowed on Loki:
# Maximum length accepted for label names.
# CLI flag: -validation.max-length-label-name
[max_label_name_length: <int> | default = 1024]
# Maximum length accepted for label value. This setting also applies to the
# metric name.
# CLI flag: -validation.max-length-label-value
[max_label_value_length: <int> | default = 2048]
However it most likely won’t be a good idea unless performance of cluster is not a concern.
I do share your pain, however, on logs from security tools just being all over the place and difficult to parse. I haven’t had to deal with ModSec, but I did have to work with nginx app protect recently. App Protect has the functionality to log in json format, if ModSec has this I’d recommend trying that.
Also if you can provide an example of source log line and what you’d like to see as result that would be good to know as well.
1 Like