I’ve been spending the last couple of weeks absolutely struggling to get Loki (and Tempo) up and running properly in an on-premises Kubernetes cluster.
My first attempt was to use the “single binary” approach but the Helm chart (helm-charts/charts/loki at main · grafana/helm-charts · GitHub) lets you inappropriately attempt to scale it so I was having problems with that because it was getting confused having multiple instances. For reference on those details, see my post in the Tempo forum: Traces and Logs intermittently disappearing from Tempo and Loki
Now my production use-case is going to involve a lot of logging so I don’t think the “single binary” approach is appropriate as I need the ability to scale up. My impression was that using Loki in distributed/microservices mode would be the way to go. However, I do NOT want to rely on external storage like S3, GCS, or Azure but after actually testing the loki-distributed Helm chart with filesystem
storage, it seems this does NOT work (details in my Logs disappearing post).
The problem is that I am seeing conflicting info in the documentation that is confusing me. I thought it was possible to use file system storage with boltdb-shipper
because the Architecture → Storage → Single Store says:
Loki stores all data in a single object storage backend. This mode of operation became generally available with Loki 2.0 and is fast, cost-effective, and simple, not to mention where all current and future development lies. This mode uses an adapter called
boltdb_shipper
to store theindex
in object storage (the same way we storechunks
).
OK, sounds like boltdb-shipper is the way to go! All future development lies here…
Oops, but here on this Single Store Loki (boltdb-shipper index type) page it doesn’t mention anything about using local storage with boltdb-shipper.
Then on the Filesystem Object Store page it explicitly states filesystem doesn’t work with scaling…
So perhaps I’ve been thoroughly confused by the Helm charts inappropriate flexibility but can someone who knows for sure please clarify whether external storage (S3, GCS, Azure, etc.) is required for Loki in distributed mode if you actually want to take advantage of the scaling capabilities? I have the same question for Tempo as well.