Hi! I want move our data from s3 to local filesystem, how can I do that?
Thanks @evgenyluvsandugar for posting this here, the question gets asked enough that hopefully it will help others to see the answer here:
Yes this should be possible (and the reverse possible too)*
* I’ve never tried this
The chunks themselves can be copied from s3 to filesystem or vice versa, their content wouldn’t change, what does need to change is the names and the location.
In an object store like S3 (or GCS or Azure Blob) the directory structure looks like this:
bucket_name /index /tenantID1 /tenantID2 ...
If you are using Loki in single user mode, you would see:
bucket_name /index /fake
As Loki choses the tenant ID of
fake (sorry this isn’t the best name but it’s almost impossible to change at this point)
On the local filesystem store the layout is a little different,
/path/to/filesystem/store /index /ZmFrZS85MTg1NDQzM2Q0YzlkNjlmOjE3NGZlMGYyMDE1OjE3NGZlNDdmODgzOjU4NWMxMTU3 /ZmFrZS85MTg1NDQzM2Q0YzlkNjlmOjE3NjcyNmI1ODYxOjE3NjcyN2Q1NTBkOjc5NGQyMThh /ZmFrZS85MTg1NDQzM2Q0YzlkNjlmOjE3NGZlNGZjMDJjOjE3NGZlOTUwZDUzOjk1OGY5ZTU5 /ZmFrZS85MTg1NDQzM2Q0YzlkNjlmOjE3NjcyY2JiNmNlOjE3NjcyZWMwOWFmOmNlMWQ3ZTdh ...
The path here is what’s defined in the config like so
storage_config: filesystem: directory: /path/to/filesystem/store
those really long names are actually base 64 encoded names and if you decode one you find:
So, this becomes the problem, converting between an object store and the filesystem store requires appending the tenantID to the chunk name and then base64 encoding it and storing that in the filesystem directory.
The index folder can be copied as is I believe (and is only present if you are using boltdb-shipper index type)
Unfortunately nobody has built any tooling to do this yet as far as I know.
I’m sorry this isn’t probably the answer you were hoping for but it should be possible to do this although it would very likely require building a tool to accomplish.
Thank you! It works!
Did you build any tooling to do this? Any scripts or something you could share?
Yes, simple python script:
import os import base64 def main(): for count, filename in enumerate(os.listdir(".")): full_filename = 'fake/' + filename b_filename = full_filename.encode("ascii") b64_filename = base64.b64encode(b_filename) os.rename(filename, b64_filename) if __name__ == '__main__': main()
Fantastic, thank you so much for your follow up and for the script. Should be a big help to anyone who finds this in the future!
So I want to try the other way around. Moving from file system to cloud blob storage. Additionally moving from single-binary to the distributed setup.
I’ve checked the file system and found the files like described (base64 encoded file names)
index directory seems to be a little different though? The file system contains subdirectories which itself only contain single extensionless files. The
index subdirectories on my cloud storage (Azure Blob Storage) seem to contain
.gz files having
compactor in its file names.
Is there a way to convert these index files to the format required in the cloud storage?
As for the
chunks, I would reverse the python script, then there is the question, do I need to keep the
fake tenant id or should it be the tenant-id i want this data to be available in? As for file names, the chunk file names in the blob storage look like uuids. Are the filesystem file names also uuids and is there a way to convert between them?
Thanks in advance
@ewelch could you share anything regarding the file names?
The index directory can be copied as is no changes are necessary it will be alongside the tenant directories:
You will have to keep all the objects in the
fake folder. The tenant information is also included in the index and chunks, you won’t be able to just move them to change the tenant ID.
There is a tool we wrote that can support migrating between Loki storage, you can use this to change the tenant ID in the process of moving.