How to index guids into loki so I can query them after

Hello,

I have a web application, that for each visitor, a unique guid is generated… like f1b4fbc3-f8cb-48b9-ad3a-e93091d01f39.

I have many millions of such guids per month, over 10 million.

I ingesting the logs, by having these guids converted into a loki labels.

The label has the key of ‘visitorguid’ and the value is the many values that we can have.

I then want to query via grafana, via a specific ‘guid’ and find all logs related to that guid.

However after my initial testing, it seems to be really slow to the point that it does not work anymore.

How can I index these guids in loki so that its fast when i query them?

Remember, each guid can have multiple ‘logs’.
example

f1b4fbc3-f8cb-48b9-ad3a-e93091d01f39 - user entered page with url of /gifts
f1b4fbc3-f8cb-48b9-ad3a-e93091d01f39 - user bought a panda bear
etc.

Some people might say that I need to narrow them down via a timerange, or other methods, but narrowing them down is not something that I want to do, I just want to retrieve all the logs, that contain a certain ‘Guid’ label, and not have to invest time into thinking how to narrow them now.

It used to work flawlessly even with this amount of data in elastic, so why cant loki index these labels so I can then query them. Is it because loki is not designed to allow me to query a very specific log, but to look at logs from a high level?

You do not want to index them, nor do you need to. See Label best practices | Grafana Loki documentation for best practice on labels.

We usually use labels to define the general purpose and locale of the logs (env name, region, hostname, etc). And for things like unique ID we would parse for those in real time using logql.

So your saying I need to parse them in real-time using logql.

Wont parsing millions of lines, at runtime, be very slow?

So I want to search via only a GUID, and instantly get back all the logs that contain that GUID, be it in their metadata, the logline or something else.

Just filtering logs by string is pretty fast. Loki’s performance comes from distribution, and this is generally not a concern as long as your cluster is configured correctly with at least a couple of readers for parallel processing.