How to manage historical data in different S3 bucket

I need to configure two S3 buckets for Loki backend storage. One bucket is for recent data (eg: 30days) and all other older data should move into a secondary S3 bucket. But when someone query historical data(older than 30 days), it should be able to access primary and secondary bucket to get relevant logs. Is this possible ?

I don’t use multiple S3 buckets myself, so I could be wrong, but I believe the intent of multiple S3 buckets is to reduce the chance of being rate limited by AWS when operating a big cluster, and I don’t believe you can control where chunks go.

Just curious, why do you want to separate chunk storage by date?

1 Like

Thanks for the reply. We expect larger volume per day (roughly 5TB/day). As per analysis we understand most of people need recent data most of times. Thats why I thought about moving older data into different bucket. That’s the main reason for managing secondary bucket.

We are not at that volume, so I can’t say with definite confidence, but I think you could get away with just 1 bucket if you run replication factor of 1. If you run 2 or 3 replication then I’d try two S3 buckets. But you won’t be able to control where chunks go, primarily because in order to move chunks you’d need to re-index as well, and I don’t think separating chunks by date offers any tangible operational benefits.

1 Like

Hi, I would like to ask, how is the query performance of 5t writes per day, and whether there will be a large number of timeouts. I have some questions about this, thank you.

Short answer is: it depends.

Loki’s read performance for the most part comes from query splitting and distribution. If you configure it properly, you can scale horizontally pretty easily. We are using simple scalable mode, and with three read containers we are able to hit about 1.8 to 2 GB/second in terms of bytes processed. I’ve seen others do 20GB/s with micro service mode, it’s pretty impressive.

For 5TB/day I think simple scalable mode would be sufficient, unless you anticipate growth then it may be more efficient to just go for the micro service mode that needs a bit more effort to maintain. You can also join the community discord and see if others who are running big clusters can share their configuration with you.