Is there a optimal way to detect duplicate logs using transformation for a huge data like 60k - 80k log lines, i have tried but working for only small amount of data.
I am not sure if there is a good way to do this. Keep in mind that while logs might seem identical to you, they most likely only appear to be identical but vary slightly. for example, the timestamp, if your logs have timestamps, would be different. Any sort of ID (such as process ID) would likely be different. So in most cases it would be far easier to search for a sub string for logs instead.
But if you really want to do it, there is a template function that can base64 encodes a string. You can use that to encode your log string and assign the value to a label, then count and aggregate on that label. Be warned, try this with a small data set first, otherwise you can end up running a query that generates tens of thousands of labels with aggregation which is most certainly not great for your cluster health.