How to discard logs by default in promtail?

jennymals · January 17, 2023, 5:30am

We need to be able to only process the logs that matches regular expressions and the remaining logs should be dropped.
We tried with the following promtail config file:

> pipeline_stages:
>           - match:
>               selector: '{job="test1"}'
>               stages:
>               - regex:
>                   expression: 'some regular expression'
>               - timestamp:
>                   source: timestamp
>                   format: "2022-01-01 00:03:06.555"
>           - match:
>               selector: '{job="test1"}'
>               stages:
>               - regex:
>                   expression: 'some regular expression'
>               - timestamp:
>                   source: timestamp
>                   format: "20022-01-01 00:03:06.555"
>               - labels:
>                   alabel:
>       - drop:
>               expression: ".*.*"]

With the above all logs are dropped because of the drop statement. If we remove it, all logs go through including those that do not match the regular expressions.
Any suggestions?

tonyswumac · January 17, 2023, 5:17pm

You need to do a regex capture so that you have something to match against. Assuming you are trying to catch the phrase “this is a legit log” in your log, maybe something like this would work:

pipeline_stages:
  - regex:
      expression: ^.*(?P<rcapture>this is a legit log).*$
      source: <IF_APPLICABLE>
  - labels:
      rcapture:
  - match:
      selector: '{rcapture="this is a legit log"}'
      stages:
        - DO_STUFF
  # Drop logs that didn't match.
  - match:
      selector: '{rcapture!="this is a legit log"}'
      action: drop
      drop_counter_reason: non_essential_log

jennymals · January 17, 2023, 7:10pm

Is that the only way? Because for logs that we want to drop we won’t know what it will contain and whether it will match legit logs i.e. logs that we want to go through. So, regex might end up matching logs that we want.

yosiasz · January 17, 2023, 7:24pm

please post sample logs and mark them keep or discard? look at this thread

pooh · January 17, 2023, 7:26pm

Your original request was that “we need to be able to only process the logs
that matches regular expressions and the remaining logs should be dropped”,
which implies that you can create regexes which match the logs to be processed
(and the remainder dropped).

If that is not the case, how do you identify which logs are of interest and
which should be discarded?

In cases such as this I often find it useful to imagine I am asking a person to
do the job, and explaining to them what they need to pay attention to and what
they should ignore.

Once you can express that, it’s generally just a matter of asking someone who
knows more about regexes than you do (because nobody ever knows enough about
regexes to solve their current requirement) how to put this in terms that a
computer can work with.

Antony.

jennymals · January 17, 2023, 7:33pm

Sample logs:
[INFO] 2022-12-01 19:30 http code 404 Keep
[DEBUG] 2022-12-01 19:30 <some response time e.g. 15s> Discard
[INFO] 2022-12-01 19:30 request time 5 seconds Keep
[INFO] 2022-12-01 19:30 unknown text from applications Discard

Yes we can use regex to get http code and request time. Everything else should be discarded.
Do you mean we need to write a regex for each one to match and then we negate it for the drop? That would typically be very long regex.

What about making the default to “drop” and then when explicitly defining action: we keep those logs.

yosiasz · January 17, 2023, 7:35pm

So the discard requirement is that is has only date time and nothing else?

You edited your response. So, yoir discard and keep look awfully identical

jennymals · January 17, 2023, 7:36pm

sorry had <unknown text from applications> and was not shown after copy paste

jennymals · January 17, 2023, 7:38pm

Yes, all the logs start with level and timestamp and some random thing afterwards. We know we need to match request time and http code status everything else we don’t care.

Another option probably is to drop on source e.g. timestamp. And when we don’t declare timestamp we have some default timestamp which we look for in the drop section.

yosiasz · January 17, 2023, 7:57pm

are there cases where you want to keep DEBUG

yosiasz · January 17, 2023, 8:01pm

but you have keep on the below log and it has no http code.

[INFO] 2022-12-01 19:30 request time 5 seconds Keep

please provide a clean and accurate requirement ?

tonyswumac · January 17, 2023, 9:08pm

In this case then I’d say you are overthinking it. Logs are not like metrics, you can have junk data in your logs, as long as you have a way to filter out the part you don’t want later. In general, cost not being a consideration, it’s much better to keep your logging pipeline clean and easy and parse those logs for what you want down the line, provided you have the way to do so.

In this case, if all your want is HTTP return code and status, you can simply log everything to Loki, and parse them like so (don’t know your log structure, so just making up pattern):

For http code:

{some_label="some_value"}
  | pattern `[<_>] <_> <_> http code <code>`
  | __error__=="" | unwrap code

For request time:

{some_label="some_value"}
  | pattern `[<_>] <_> <_> request time <time_second> seconds`
  | __error__=="" | unwrap time_second

system · January 17, 2024, 9:08pm

This topic was automatically closed 365 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Promtail Negative Lookahead Problem Grafana Loki regex , promtail	3	1603	July 25, 2024
Promtail Use Case: Drop anything except lines matching a Regex Grafana Loki	5	4557	October 11, 2022
Configure regex in promtail Grafana Loki promtail	1	190	October 31, 2024
Promtail - regex pipeline vs. pattern parser in Loki Grafana Loki	5	4765	June 27, 2024
Promtail drop certain logs Grafana Loki	5	1320	December 21, 2022

How to discard logs by default in promtail?

Related topics