Hi,
I’m exploring the possibility of using Loki for long-term log retention (e.g., 10 years) for compliance and archival purposes. Is it recommended to use Loki at all for this purpose?
Could Loki introduce native support for long-term storage optimization, such as downsampling, archival features, or efficient handling of large indexes?
Is it possible at all to handle this with storage schema?
If there are existing discussions, documentation, or tools that can help optimize Loki for long-term retention, please share them.
I don’t think there is any inherent problem with using Loki for long-term log storage, as long as you are using backend such as S3. Regarding your questions:
- Downsampling does not make sense for logs.
- There is no archival feature from Loki, since it offloads the storage part to S3 (or whatever other object storage you use). You can archive S3 (like glacier), but you’d have to manually restore both index and the chunk files from glacier back to S3 before you can query. This is not an ideal use case.
- Not sure what you mean by large indexes. Loki keeps one index file per day (after compaction), it won’t get that big even if you have TBs of logs.
In general, as long as you don’t mind paying for the S3 storage, I don’t necessarily see any problem with this. I don’t have this requirement, therefore I haven’t really attempted this. Hopefully someone else has more inputs.
Thanks, Tony, for the feedback.
You’re absolutely right about the index size.
We don’t mind the S3 storage at all (since it’s hardware storage, not in the cloud!), but we’re concerned about querying old data from 2-3 years ago and are unsure how Loki behaves for such queries. As you mentioned, since Loki lacks an archival feature, we need to find an alternative way to move and restore old data to another storage backend, which adds extra effort.
Is there any chance Loki could consider adding an archival feature? We’re looking for a way to handle this within Loki itself instead of relying on additional layers or tools for archiving old data.
If everything is in S3 bucket, querying logs from 3 years ago is no different than querying logs from a day ago. I think your concern would more likely be if you need to query for a month of logs at a time, then you’d need to scale your cluster considerably. Otherwise I don’t think this is of concern.
I am just a regular user like you (if you are a regular user lol). I would recommend you to submit a feature request in GitHub and allow Grafana Loki team the chance to triage.