Grafana K6 - New user - Memory issues reading from JSON and/or CSV when file size is large

Hi All,

apologies for the question as it may sound silly BUT, I’m currently on the lookout for a new load testing tool. Currently we use NeoLoad and find it very user unfriendly and their support is poor to non existent. I’ve been pointed towards Grafana K6 and have it currently running in a docker container for my investigatory work. I love the coding aspect of it as well as the lovely results graphs.

We have our own console app which generates pre signed payloads to use in our http requests (ideally I’d just want to do it within K6 itself but i do not understand crypto at all :frowning: ).

Our K6 script reads from that generated json file and we are able to send in http requests to our API fine.

My issue is that if our generated file has more than 15k records (json objects) the test tries to initialize and just fails part way through.

Please see current script:

import http from ‘k6/http’;
import { check, sleep } from ‘k6’;
import { Counter } from ‘k6/metrics’;
import { SharedArray } from ‘k6/data’;
import { scenario } from ‘k6/execution’;

// Load configuration from config.json
const config = JSON.parse(open(‘/config/config.json’));

// Extract MPIDs with include set to true and their corresponding names
const includedMPIDs = config.mpidDetails
.filter((item) => item.include === true)
.map((item) => item.name);

// Load data for all included MPIDs into a shared array
const allData = ;
includedMPIDs.forEach((mpid) => {
const mpidName = mpid.toLowerCase();
const data = JSON.parse(open(/data/${mpidName}.json));
data.forEach((row) => {
allData.push({ …row, mpid: mpidName }); // Include MPID info with each row
});
});

// Use a shared array to distribute the data
const sharedData = new SharedArray(‘Combined Data’, () => allData);

// Counter for errors
export let errorCount = new Counter(‘errors’);

// Dynamically set VUs and iterations based on the number of records
// const totalRecords = includedMPIDs.reduce((sum, mpid) => sum + allData[mpid].length, 0);
// This is the test configuration
export let options = {
scenarios: {
‘process-all-data’: {
executor: ‘shared-iterations’,
vus: 100, // Number of VUs
iterations: sharedData.length, // Total number of iterations
maxDuration: ‘1h’, // Maximum test duration
},
},
cloud: {
distribution: {
distributionLabel1: { loadZone: ‘amazon:gb:london’, percent: 100 },
},
},
tlsAuth: [
{
cert: -----BEGIN CERTIFICATE----- sometext -----END CERTIFICATE-----, // Path to your PEM client certificate
key: -----BEGIN PRIVATE KEY----- some text -----END PRIVATE KEY-----, // Path to your PEM private key
},
]
};

// Log start time in setup()
export function setup() {
const startTime = new Date().toISOString();
console.log(Test started at: ${startTime});
return { startTime }; // Pass start time to teardown
}

// Function to make requests with data
function makeRequest(dataRow) {

const { mpid, if_no, payload, dip_sig, dip_time, dip_hash, dip_sig_cert } = dataRow;

let endpoint = ${config.endpoint}${config.subPath}${mpid.toLowerCase()}/${dataRow.if_no.toLowerCase()};
if (mpid.toLowerCase() == “xxx”) {
endpoint = ${config.endpoint}${config.subPath}${dataRow.if_no.toLowerCase()};
}

const headers = {
‘Content-Type’: ‘application/json’,
‘X-DIP-Signature’: dip_sig,
‘X-DIP-Signature-Date’: dip_time,
‘X-DIP-Content-Hash’: dip_hash,
‘X-DIP-Signature-Certificate’: dip_sig_cert,
};

const res = http.post(endpoint, payload, { headers });

let responseBody;
try {
responseBody = res.body.trim() !== ‘’ ? res.json() : ‘Empty response body’;
} catch (e) {
responseBody = res.body;
}

// console.log(JSON.stringify({
// mpid: mpid,
// if_no: dataRow.if_no,
// status: res.status,
// response: responseBody,
// timestamp: new Date().toISOString(),
// }));

const successStatusCode = check(res, {
‘status is 201’: (r) => r.status === 201,
});

const successBodyParse = check(res, {
‘response body can be parsed to JSON’: (r) => {
try {
if (r.body.trim() !== ‘’) {
JSON.parse(r.body);
return true;
}
} catch (e) {
console.error(Failed to parse response body: ${e.message});
}
return false;
},
});

if (!successStatusCode || !successBodyParse) {
errorCount.add(1);
}
}

// Main test function
export default function () {
// Get the current iteration’s data based on scenario.iterationInTest
const iterationIndex = scenario.iterationInTest;
const dataRow = sharedData[iterationIndex];

// Process the data
makeRequest(dataRow);

sleep(1); // Simulate realistic pacing
}

// Log end time in teardown()
export function teardown(setupData) {
const endTime = new Date().toISOString();
console.log(Test started at: ${setupData.startTime});
console.log(Test ended at: ${endTime});
}

My script is looking at the following config.json (obfuscated soem values)
{
“endpoint”: “https://xxx/”,
“subPath”: “xxx/”,
“mpidDetails”: [
{
“name”: “xxx”,
“include”: true,
“includedIfTypes”: [“IF-003”],
“apiKey”: “”
}
]
}

I’m aware there are experimental packages that i potentially could use but i do not understand them enough to be able to incorporate.

I guess I’m asking a few things…

  1. Can K6 handle reading external files of potentially >1GB in size or multiple files when combined are > 1GB? assuming sharedArray is not the correct route
  2. what is the alternative that best fits this scenario? And how/where might i implement this?
  3. if i had skeleton payloads, would it be better to use k6 to update them then use subtle crypto for the message signing and do it in real time e.g. generate + sign the payloads, rather than reading from a file?

Thanks

1 Like

Hello @chrisellison,

Welcome to our community forums! :tada:

First of all, I’d like to know a bit better about how are you running your test, and how it fails.

My issue is that if our generated file has more than 15k records (json objects) the test tries to initialize and just fails part way through.

When you say it fails, does it mean it fails due to out-of-memory (OOM) issues? Are you running it locally, or in GCk6 (Cloud)?

That said, let’s dive into your specific questions:

  1. Can K6 handle reading external files of potentially >1GB in size or multiple files when combined are > 1GB? assuming sharedArray is not the correct route

In general, k6 is not very good at handling large files, and it is something we’ve been working on from time to time, and will likely continue to do in the near future. If you’re running your test locally, it will consume some considerable amount of memory, but it should be feasible.

Why do you say SharedArray is not the correct route? Instead, I’d suggest to keep using it but with a few minor changes.

  1. what is the alternative that best fits this scenario? And how/where might i implement this?

The very first two recommendations I’d suggest are:

  • Try to move the data initialization code inside the SharedArray initialization function, as in this example. I see in your example you have the initialization code outside the initialization function, and in that one you just return the global constant defined above (allData).
    • Note that, as stated in docs: SharedArray is an array-like object that shares the underlying memory between VUs. The function executes only once, and its result is saved in memory once. So, in your example, allData isn’t shared but duplicated, and that makes the use of SharedArray likely useless. Try to follow the examples.
  • Try to use the open function from the experimental fs package - see these docs. To do so, you mostly need to add one additional import: import { open } from 'k6/experimental/fs'; - see open.
  1. if i had skeleton payloads, would it be better to use k6 to update them then use subtle crypto for the message signing and do it in real time e.g. generate + sign the payloads, rather than reading from a file?

Note that large data shared across VUs is read-only, so unless that’s something you would do as part of each iteration, I think your current approach is better. Whether the crypto work should be part of each iteration or not really depends on your use case and scenario, so I cannot tell.

If that would prevent you from having to load large data files, then it might worth a try, because as I said above, k6 is not very good at handling large files.


I hope that helps!

Hi @joanlopez ,

thanks for you reply.

I have found that running K6 locally outside of the docker container and with no instances of google chrome running etc prevented any memory issues. However i will respond to your queries anyway

  1. I was running it locally in a docker container not in the cloud. For the specific error i did not look at the logs of the k6 instance in the docker container however after i ran the command to run the k6 script, it shows the progress bar of the initialization and it failed to complete and just stopped loading.

  2. Using sharedArray and experimental/fs open seems to have done the trick so far, needs further testing with larger files

  3. In each iteration I’d generate a unique payload using the in memory skeleton payloads, sign it and POST it to our API. I have an open forum ticket regarding this however it has been temporarily hidden by the bot until it has been reviewed (id: 139362?). But in it i discuss a pre-request script in postman (which signs the payloads perfectly) resulting in a 201 response from our API vs the k6 script which results in an invalid signature response from our API.

I think I’d love to explore 3) above however until it has been reviewed I can’t really progress with it.

Thanks for your help :slight_smile:

IMHO, @joanlopez, what lead to the mistakes @chrisellison made are

two issues that k6 miscommunicates

(or at least, does not communicate well enough)

  1. Every VU gets a clean JS runtime and executes the test script on it.
  2. SharedArray - the mechanism to share data between VUS - presents itself as a constructor - and that is misleading.

let me elaborate.

Every VU gets a clean JS runtime and executes the test script on it.
Once initiated, the VU can then execute multiple iterations.
The stress is that the init part of the script executes once per VU on a clean JS runtime.

So if your script opens a big file in the init part - it happens once per VU.
you got 200 VUs? the init script happened 200 times.

Once initiated, the VU can then execute several iterations.

If you want to save memory and share data between VUs - you should then use SharedArray.
SharedArray presents itself a a constructor - a design pattern that implies, here be a new instance that can hold some state. But in fact is basically a broker. A cached-value broker.

The SharedArray expects a name and a factory function.

Under the hood, the Shared Array accesses some shared-data registry, and checks if some other JS runtime have already started initiating the cached value. If yes - then it just waits for the value. The factory executes only if no other JS runtime started initiating the value.

After the factory returns/resolves - the resulting value is scrutinized for the immutability that guarantees thread-safety. If it fails this check - the entire k6 will fail with a message.

Unfortunately, I have not found this knowledge stated clearly. I had to reverse engineer it through trial and error.

what could have looked different?

I suppose that API in the spirit of the example below would have communicated better:

import { shared } from `k6/data`

const data = shared.initOnce('my-data', async () => {
   await fetch...  / open(...)  / fs.stream / whatever
   parse...
   map? filter/reduce based on __ENV?

   return initiatedData;
});

I also believe that the limitation to share only arrays is a bad design decision.
I would have liked to share an immutable object with form like:

const fixtures = shared.initOnce('test-fixtures', () => .. );

fixtures.accounts.admins //=> array of admin accounts
fixtures.accounts.users //=> array of user accounts
fixtures.accounts.guest //=> array of low-privilege guest accounts

fixtures.posts.large // => array of very large blog posts
fixtures.posts.onelinerImage // => array of oneliner blog posts with an image

fixture.catalog.categogies //array of categories
fixture.catalog.categories.byName[ <name ].products // array of products
//...and so on...

I would have loved to be able to have the cached shared object feature APIs that expect a k6/execution instance, and return a fixture selection based on iterationId, idInIteration or what have you logic there,
but when I tried that, it did not pass the immutability check.

So my pattern is

import { SharedArray } from 'k6/data'
import execution from 'k6/execution'
import fixtureMgrFactory, { loadFixtures } from './lib/...'

const fx = fixtureMgrFactory(new SharedArray('test-fixtures', loadFixtures ));

export default function() {
   const data = fx.getFixturesFor(execution)
   // have fun...
}

I’ll say one more thing, if this discussion has already begun…

The fixtureMgrFactory is a utility. and the parts it exports are shared accross cenarios and test scripts. As such - it has to be tested by itself, not as a part of the scripts and scenarios which it serves.

Currently, we’re testing them in vitest using loads of mocks for k6.

It would have been wonderful if k6 would have either owned a package that features such mocks, or featured capabilities for functional testing, perhaps in the form of k6/tests that exports describe and it, etc and work with a cli-flag like --unit-test (like nodejs node --test) … or maybe with a form of a test executor that injects to the scenarios in addition to the setup data a tap.js like object… donno a good discussion is needed…

IMHO - this will let the k6 ecosystem grow and flourish even more.