How move chunks and index files from s3 to local filesystem?

Hi! I want move our data from s3 to local filesystem, how can I do that?

1 Like

Thanks @evgenyluvsandugar for posting this here, the question gets asked enough that hopefully it will help others to see the answer here:

Yes this should be possible (and the reverse possible too)*

* I’ve never tried this

The chunks themselves can be copied from s3 to filesystem or vice versa, their content wouldn’t change, what does need to change is the names and the location.

In an object store like S3 (or GCS or Azure Blob) the directory structure looks like this:

bucket_name
    /index
    /tenantID1
    /tenantID2
   ...

If you are using Loki in single user mode, you would see:

bucket_name
    /index
    /fake

As Loki choses the tenant ID of fake (sorry this isn’t the best name but it’s almost impossible to change at this point)

On the local filesystem store the layout is a little different,

/path/to/filesystem/store
  /index
  /ZmFrZS85MTg1NDQzM2Q0YzlkNjlmOjE3NGZlMGYyMDE1OjE3NGZlNDdmODgzOjU4NWMxMTU3
  /ZmFrZS85MTg1NDQzM2Q0YzlkNjlmOjE3NjcyNmI1ODYxOjE3NjcyN2Q1NTBkOjc5NGQyMThh
  /ZmFrZS85MTg1NDQzM2Q0YzlkNjlmOjE3NGZlNGZjMDJjOjE3NGZlOTUwZDUzOjk1OGY5ZTU5    
  /ZmFrZS85MTg1NDQzM2Q0YzlkNjlmOjE3NjcyY2JiNmNlOjE3NjcyZWMwOWFmOmNlMWQ3ZTdh
  ...

The path here is what’s defined in the config like so

storage_config:
  filesystem:
    directory: /path/to/filesystem/store

those really long names are actually base 64 encoded names and if you decode one you find:

fake/91854433d4c9d69f:174fe0f2015:174fe47f883:585c1157

So, this becomes the problem, converting between an object store and the filesystem store requires appending the tenantID to the chunk name and then base64 encoding it and storing that in the filesystem directory.

The index folder can be copied as is I believe (and is only present if you are using boltdb-shipper index type)

Unfortunately nobody has built any tooling to do this yet as far as I know.

I’m sorry this isn’t probably the answer you were hoping for but it should be possible to do this although it would very likely require building a tool to accomplish.

6 Likes

Thank you! It works!

WooHoooooo!!!

Did you build any tooling to do this? Any scripts or something you could share?

Yes, simple python script:

import os
import base64

def main():
    for count, filename in enumerate(os.listdir(".")):
        full_filename = 'fake/' + filename
        b_filename = full_filename.encode("ascii")
        b64_filename = base64.b64encode(b_filename)

        os.rename(filename, b64_filename)

if __name__ == '__main__':
    main()
2 Likes

Fantastic, thank you so much for your follow up and for the script. Should be a big help to anyone who finds this in the future!

1 Like

So I want to try the other way around. Moving from file system to cloud blob storage. Additionally moving from single-binary to the distributed setup.

I’ve checked the file system and found the files like described (base64 encoded file names)

The index directory seems to be a little different though? The file system contains subdirectories which itself only contain single extensionless files. The index subdirectories on my cloud storage (Azure Blob Storage) seem to contain .gz files having ingester or compactor in its file names.

Is there a way to convert these index files to the format required in the cloud storage?

As for the chunks, I would reverse the python script, then there is the question, do I need to keep the fake tenant id or should it be the tenant-id i want this data to be available in? As for file names, the chunk file names in the blob storage look like uuids. Are the filesystem file names also uuids and is there a way to convert between them?

Thanks in advance :slight_smile:

@ewelch could you share anything regarding the file names?

The index directory can be copied as is no changes are necessary it will be alongside the tenant directories:

You will have to keep all the objects in the fake folder. The tenant information is also included in the index and chunks, you won’t be able to just move them to change the tenant ID.

There is a tool we wrote that can support migrating between Loki storage, you can use this to change the tenant ID in the process of moving.

2 Likes

I found this useful topic, it’s exactly what I needed. I followed the script and it really helped me too, thanks! :handshake:

I’ve came across the situation as @geraldp. I want to move my logs from my local filesystem to a s3 bucket.

Already created a script that decodes the filenames to the new ones and also created a /fake dir in the bucket and added all the files there.

image

And the /fake folder looks like this

image

But when I try to query for any logs I always get this error

object not found in storage

Do I have to set any configuration on loki for it to be able to find the logs?

Thanks! :slight_smile:

@ewelch Do you know what the problem might be?

Sorry I haven’t been back here in a bit, did you work this out @israelglar?

Somebody share script for migration from filesystem to s3, please. Thanks a lot.

1 Like

Could this approach be used to aggregate logs in an offline multi-cluster environment? We would have many k3s clusters running loki offline and want to move their logs to a more robust cluster where grafana will be running to analyze logs when they get online. The original approach we were pursuing was to use fluent bit instead of promtail since fluent bit has a persistent buffer that can queue logs until it gets connected to a loki instance.

I am reaching out to seek guidance on obtaining sample configuration files for the Loki Migrate Tool. My understanding is that these files should contain, at a minimum, the addresses of two source and destination servers, along with authentication details. Additionally, one of my servers utilizes basic authentication at the nginx proxy level.

I am currently facing an issue where Loki is unable to clear old data from the local storage that exceeds the retention period. The storage size is growing, and the logs indicate the following errors:

-------------------------------
level=error ts=2023-11-28T09:56:25.066736207Z caller=table.go:167 table-name=index_19606 org_id=fake msg="index set has some problem, cleaning it up" err="gzip: invalid checksum"
level=error ts=2023-11-28T09:56:25.066855929Z caller=chunk_store.go:526 org_id=fake msg="error querying storage" err="gzip: invalid checksum"
-------------------------------
level=info ts=2023-11-28T12:01:33.461054843Z caller=compactor.go:364 msg="applying retention with compaction"
level=info ts=2023-11-28T12:01:33.47055703Z caller=compactor.go:490 msg="compacting table" table-name=index_19606
level=error ts=2023-11-28T12:01:33.520575449Z caller=compactor.go:430 msg="failed to compact files" table=index_19606 err="gzip: invalid checksum"
level=error ts=2023-11-28T12:01:33.52061652Z caller=compactor.go:370 msg="failed to run compaction" err="gzip: invalid checksum"
-------------------------------

I am using Loki server version 2.5.0
Below is the provided configuration file:

auth_enabled: false
server:
  http_listen_port: 3100
  grpc_server_max_concurrent_streams: 1000
  grpc_server_max_recv_msg_size: 10000000
  grpc_server_max_send_msg_size: 10000000
  http_server_read_timeout: 5m0s
  http_server_write_timeout: 5m0s
  http_server_idle_timeout: 5m0s
common:
  path_prefix: /tmp/loki
  storage:
    filesystem:
      chunks_directory: /tmp/loki/chunks
      rules_directory: /tmp/loki/rules
  replication_factor: 1
  ring:
    instance_addr: 127.0.0.1
    kvstore:
      store: inmemory
schema_config:
  configs:
  - from: '2022-02-24'
    store: boltdb-shipper
    object_store: filesystem
    schema: v11
    index:
      prefix: index_
      period: 24h
storage_config:
  boltdb_shipper:
    active_index_directory: /tmp/loki/boltdb-shipper-active
    cache_location: /tmp/loki/boltdb-shipper-cache
compactor:
  working_directory: /tmp/loki/boltdb-shipper-compactor
  retention_enabled: true
limits_config:
  max_global_streams_per_user: 10000
  per_stream_rate_limit: 10M
  per_stream_rate_limit_burst: 20M
  ingestion_rate_mb: 16
  ingestion_burst_size_mb: 24
  retention_stream:
  - selector: '{job="sqlog"}'
    priority: 1
    period: 60d
ruler:
  rule_path: /tmp/loki/rules-temp
querier:
  query_timeout: 5m0s
  engine:
    timeout: 6m0s

Therefore, I thought that perhaps migrating the data to another server might be the only solution. I would appreciate any assistance you can provide.

Thank you for your time and consideration.