Detecting Duplicates

brahmeshlodhi · July 3, 2025, 10:41am

Is there a optimal way to detect duplicate logs using transformation for a huge data like 60k - 80k log lines, i have tried but working for only small amount of data.

tonyswumac · July 3, 2025, 5:01pm

I am not sure if there is a good way to do this. Keep in mind that while logs might seem identical to you, they most likely only appear to be identical but vary slightly. for example, the timestamp, if your logs have timestamps, would be different. Any sort of ID (such as process ID) would likely be different. So in most cases it would be far easier to search for a sub string for logs instead.

But if you really want to do it, there is a template function that can base64 encodes a string. You can use that to encode your log string and assign the value to a label, then count and aggregate on that label. Be warned, try this with a small data set first, otherwise you can end up running a query that generates tens of thousands of labels with aggregation which is most certainly not great for your cluster health.

Topic		Replies	Views
How to use "Exact" de-duplication on query result? Grafana Loki	7	5720	February 16, 2025
Missing log lines when logging identical lines at the same time Grafana Loki loki	1	79	May 17, 2025
Duplicate log lines Grafana Loki loki	3	2775	July 28, 2021
Loki query question about duplicate data Grafana Loki query-help	1	870	December 17, 2022
Loki drops duplicate log entries Grafana Loki api	3	172	February 26, 2025

Detecting Duplicates

Related topics