How do I read data from a CSV file coming from S3?

I would need some help with an example of CSV data coming from S3 file and use data once per iteration.

tried with Below,

  1. Read data file from S3 is in SetUp function.
  2. Use SharedArray approach however it can not be used in setup and can only be used in Init.

Appreciate your help in advance !!.

How big is the .csv file? Because if it’s too big for every VU to have a copy in memory, then, unfortunately, there is no way to currently do this efficiently :disappointed: You should probably use curl to download the file before your k6 run and then use open() and SharedArray in the init context in k6 to efficiently load it for all VUs.

We plan to solve this limitation in the relatively near future and allow better SharedArray support from setup() and elsewhere, follow these GitHub issues for updates: SharedArray improvements · Issue #2043 · grafana/k6 · GitHub and setup() per scenario · Issue #1638 · grafana/k6 · GitHub.

The CSV file is not big, however our automated process requires reading data from S3 file.

we mimicked kind of shared array behavior i.e converting CSV to JSON array and for no iterations we are sending information from lambda to code build as a parameter. scenario.iterationInTest helped us to achieve to read all the data.

Thank you for help anyways.

Best, Sam

Hi @ssamineni

When it comes to downloading files directly from S3, you might find our AWS extension useful? As @ned mentioned, just be sure to avoid downloading/copying big files outside of the init context, as it might lead to bloat the k6 process’ memory consumption.

Hope that’s helpful :bowing_man:

Hi @oleiade thanks for the pointer

May I ask how to download big files within init context ? Because when I try to use the AWS extension for SharedArray (which is allowed only in init context), I received Making http requests in the init context is not supported error.

Thanks!

Hey @dshaw1 :wave:

Welcome to the support forum :tada:

Indeed, at the moment, it is not allowed to make HTTP requests in the init context (which the S3 library does), it appears to be a mistake I made in my previous answer.

If you want to use SharedArray, I assume you’re also trying to limit the memory impact of loading such a file in a load test with many VUs. We’re actively working towards implementing the Streams API, which we expect to see land in k6 in a couple of releases. Once it is implemented, I anticipate that we might be able to offer more efficient ways to deal with S3-downloaded files.

In the meantime, I’d recommend loading the file from S3 in the setup function and parsing it there as CSV directly instead.

Let me know if that helps :bowing_man: