Manager here. I am new to Loki and I got a few questions

Hi, manager here. I am new to Loki and would like to ask about a use case scenario for my teams of developers.

Currently, my teams have been using Prometheus + Grafana to evaluate over time the speed of functions in our web services, count how many times each function is being called, count how many of each HTTP Response codes occurred, etc. Our apps simply take care of all these statistics internally and have Prometheus scrape them from time to time. Then, Grafana comes in for visualization. Things work fine.

But then I heard about Loki just now, and another team (not mine) has just began using it. I see that they have successfully bringing in the logs to Loki and Grafana. I see they could also got all the data such as time taken, what HTTP Response is generated for each request, all other information related to this request, etc. I see they also got alert when particular errors have occurred in the logs.

  1. For the above scenario, I see that Prometheus already scraped all the data and have them in its time series database. Grafana could simply fetch all these numbers, plot them and perform alerts. But for logs, I see that Loki got an engine to manage all the strings. One advantage I see is that you could write anything to log files, and then filter them all out later. But wouldn’t this mean string manipulation and evaluation on the logs all the time? I see the other team just began using it, but I have no information on how will things go in the long run. Right now, I can see things are lightning fast on their Loki + Grafana, but in the long run with longer and longer logs generated daily, will things become slower?

  2. Between what my teams have been doing (Prometheus + Grafana) and Loki, for my Use Case, which is more preferable? Are there significant advantages in the long run that we should switch from Prometheus + Grafana to Loki + Grafana in our Use Case?

  3. How long does Loki keep the logs? Can we specify Loki to keep logs for x months, then move and compress older logs?

Thank you.

Right now, I can see things are lightning fast on their Loki + Grafana, but in the long run with longer and longer logs generated daily, will things become slower?

Not necessarily, there is no magic bullet in Loki, it has to read the log content in order to generate those graphs, so it’s a question of how much data you make it read. Loki can be configured to break queries down into pieces to perform them in parallel so it can be made to query a lot of data very quickly, however setting up this operational model is more complicated.

Between what my teams have been doing (Prometheus + Grafana) and Loki, for my Use Case, which is more preferable? Are there significant advantages in the long run that we should switch from Prometheus + Grafana to Loki + Grafana in our Use Case?

They both complement each other. Prometheus can do a much better job of enabling queries over longer periods of time when you are aggregating metrics directly from your applications. There are some nuanced details here but the long and short of it is: metrics are float64 and logs are a string, so it largely comes down to how much data is being processed.

It’s not uncommon to have log streams that can log GB/hour or even GB/second, in which case extracting metrics from them requires processing this much data. Loki can be setup to do this in parallel and can still be very fast but it will never be as fast as a precalculated metric stored in a metrics database like Prometheus.

So we use both, our applications all export metrics and we use these in our alerts and dashboards. However we also log a lot of data and process that with Loki, this data we use for more specific use cases like troubleshooting and debugging, but it has the advantage of supporting very high cardinality and way more context, but typically we use metrics to narrow down the time range to as small as possible before we start digging into logs.

How long does Loki keep the logs? Can we specify Loki to keep logs for x months, then move and compress older logs?

Loki compresses and stores the logs in an object store like GCS or S3 and is already designed to store them very inexpensively. Retention is configured both in Loki and by setting a TTL on the objects in the object store, but it can be as long as you want, or infinite. There is no penalty for storing data for longer periods aside from the storage costs, querying is not affected by the age of the data, querying 1d of logs from yesterday or one year ago is the same to Loki, query performance is more affected by the length of the query (asking for 7d or 30d of logs vs 1d for example)

Hope this helps!