Is |~ | regexp faster than | regexp?

Hi. I have quite a lot of logs and I am researching how to optimize the queries. I am parsing some logs in CSV format. Consider the following:

{job="JOBS"}
| regexp `^(?P<time>[^,]*),(?P<category>[^,]*),(?P<etc>[^,]*)`
| time =~ `1[05].*`

vs:

{job="JOBS"}
|~       `^(?P<time>[^,]*),(?P<category>[^,]*),(?P<etc>[^,]*)`
| regexp `^(?P<time>[^,]*),(?P<category>[^,]*),(?P<etc>[^,]*)`
| time =~ `1[05].*`

From my understanding: the regexp does not filter lines. It only sets the labels. The actual line filtering happens when matching | time =~ is matched, and lines without time label are excluded from the match. Because such lines get “later” in the query, the query might be slower. Vs in the second code, such lines are excluded right away, but the regex is probably compiled twice? Can I run regexp and extract label and filter lines at the same time?

My question is, which one is expected to be faster? Or it does not matter? Thanks!

Normally I’d say reducing number of logs to be processed by filtering them is generally a good practice.

But in your case I don’t think it really matters. Because your log is in CSV format, in your second example all the regex filtering is doing is making sure that the CSV has at least 3 fields, therefore not terribly useful. Essentially both of the following would pass your regex filter:

10,category,something_etc
clearly_not_time,123,456

So in this case I’d just go with your first example, and then match label afterwards.

Hi, thanks for the response.
Actually here a lot of log lines with the same origin labels which are not csv and there are around 11 csv fields. I would say around lower half like 40% of logs are not csv.
With these assumptions in mind, do you think your opinion changes?
Thank you.