Grafana K6 - New user - Memory issues reading from JSON and/or CSV when file size is large

Hi All,

apologies for the question as it may sound silly BUT, I’m currently on the lookout for a new load testing tool. Currently we use NeoLoad and find it very user unfriendly and their support is poor to non existent. I’ve been pointed towards Grafana K6 and have it currently running in a docker container for my investigatory work. I love the coding aspect of it as well as the lovely results graphs.

We have our own console app which generates pre signed payloads to use in our http requests (ideally I’d just want to do it within K6 itself but i do not understand crypto at all :frowning: ).

Our K6 script reads from that generated json file and we are able to send in http requests to our API fine.

My issue is that if our generated file has more than 15k records (json objects) the test tries to initialize and just fails part way through.

Please see current script:

import http from ‘k6/http’;
import { check, sleep } from ‘k6’;
import { Counter } from ‘k6/metrics’;
import { SharedArray } from ‘k6/data’;
import { scenario } from ‘k6/execution’;

// Load configuration from config.json
const config = JSON.parse(open(‘/config/config.json’));

// Extract MPIDs with include set to true and their corresponding names
const includedMPIDs = config.mpidDetails
.filter((item) => item.include === true)
.map((item) => item.name);

// Load data for all included MPIDs into a shared array
const allData = ;
includedMPIDs.forEach((mpid) => {
const mpidName = mpid.toLowerCase();
const data = JSON.parse(open(/data/${mpidName}.json));
data.forEach((row) => {
allData.push({ …row, mpid: mpidName }); // Include MPID info with each row
});
});

// Use a shared array to distribute the data
const sharedData = new SharedArray(‘Combined Data’, () => allData);

// Counter for errors
export let errorCount = new Counter(‘errors’);

// Dynamically set VUs and iterations based on the number of records
// const totalRecords = includedMPIDs.reduce((sum, mpid) => sum + allData[mpid].length, 0);
// This is the test configuration
export let options = {
scenarios: {
‘process-all-data’: {
executor: ‘shared-iterations’,
vus: 100, // Number of VUs
iterations: sharedData.length, // Total number of iterations
maxDuration: ‘1h’, // Maximum test duration
},
},
cloud: {
distribution: {
distributionLabel1: { loadZone: ‘amazon:gb:london’, percent: 100 },
},
},
tlsAuth: [
{
cert: -----BEGIN CERTIFICATE----- sometext -----END CERTIFICATE-----, // Path to your PEM client certificate
key: -----BEGIN PRIVATE KEY----- some text -----END PRIVATE KEY-----, // Path to your PEM private key
},
]
};

// Log start time in setup()
export function setup() {
const startTime = new Date().toISOString();
console.log(Test started at: ${startTime});
return { startTime }; // Pass start time to teardown
}

// Function to make requests with data
function makeRequest(dataRow) {

const { mpid, if_no, payload, dip_sig, dip_time, dip_hash, dip_sig_cert } = dataRow;

let endpoint = ${config.endpoint}${config.subPath}${mpid.toLowerCase()}/${dataRow.if_no.toLowerCase()};
if (mpid.toLowerCase() == “xxx”) {
endpoint = ${config.endpoint}${config.subPath}${dataRow.if_no.toLowerCase()};
}

const headers = {
‘Content-Type’: ‘application/json’,
‘X-DIP-Signature’: dip_sig,
‘X-DIP-Signature-Date’: dip_time,
‘X-DIP-Content-Hash’: dip_hash,
‘X-DIP-Signature-Certificate’: dip_sig_cert,
};

const res = http.post(endpoint, payload, { headers });

let responseBody;
try {
responseBody = res.body.trim() !== ‘’ ? res.json() : ‘Empty response body’;
} catch (e) {
responseBody = res.body;
}

// console.log(JSON.stringify({
// mpid: mpid,
// if_no: dataRow.if_no,
// status: res.status,
// response: responseBody,
// timestamp: new Date().toISOString(),
// }));

const successStatusCode = check(res, {
‘status is 201’: (r) => r.status === 201,
});

const successBodyParse = check(res, {
‘response body can be parsed to JSON’: (r) => {
try {
if (r.body.trim() !== ‘’) {
JSON.parse(r.body);
return true;
}
} catch (e) {
console.error(Failed to parse response body: ${e.message});
}
return false;
},
});

if (!successStatusCode || !successBodyParse) {
errorCount.add(1);
}
}

// Main test function
export default function () {
// Get the current iteration’s data based on scenario.iterationInTest
const iterationIndex = scenario.iterationInTest;
const dataRow = sharedData[iterationIndex];

// Process the data
makeRequest(dataRow);

sleep(1); // Simulate realistic pacing
}

// Log end time in teardown()
export function teardown(setupData) {
const endTime = new Date().toISOString();
console.log(Test started at: ${setupData.startTime});
console.log(Test ended at: ${endTime});
}

My script is looking at the following config.json (obfuscated soem values)
{
“endpoint”: “https://xxx/”,
“subPath”: “xxx/”,
“mpidDetails”: [
{
“name”: “xxx”,
“include”: true,
“includedIfTypes”: [“IF-003”],
“apiKey”: “”
}
]
}

I’m aware there are experimental packages that i potentially could use but i do not understand them enough to be able to incorporate.

I guess I’m asking a few things…

  1. Can K6 handle reading external files of potentially >1GB in size or multiple files when combined are > 1GB? assuming sharedArray is not the correct route
  2. what is the alternative that best fits this scenario? And how/where might i implement this?
  3. if i had skeleton payloads, would it be better to use k6 to update them then use subtle crypto for the message signing and do it in real time e.g. generate + sign the payloads, rather than reading from a file?

Thanks

1 Like

Hello @chrisellison,

Welcome to our community forums! :tada:

First of all, I’d like to know a bit better about how are you running your test, and how it fails.

My issue is that if our generated file has more than 15k records (json objects) the test tries to initialize and just fails part way through.

When you say it fails, does it mean it fails due to out-of-memory (OOM) issues? Are you running it locally, or in GCk6 (Cloud)?

That said, let’s dive into your specific questions:

  1. Can K6 handle reading external files of potentially >1GB in size or multiple files when combined are > 1GB? assuming sharedArray is not the correct route

In general, k6 is not very good at handling large files, and it is something we’ve been working on from time to time, and will likely continue to do in the near future. If you’re running your test locally, it will consume some considerable amount of memory, but it should be feasible.

Why do you say SharedArray is not the correct route? Instead, I’d suggest to keep using it but with a few minor changes.

  1. what is the alternative that best fits this scenario? And how/where might i implement this?

The very first two recommendations I’d suggest are:

  • Try to move the data initialization code inside the SharedArray initialization function, as in this example. I see in your example you have the initialization code outside the initialization function, and in that one you just return the global constant defined above (allData).
    • Note that, as stated in docs: SharedArray is an array-like object that shares the underlying memory between VUs. The function executes only once, and its result is saved in memory once. So, in your example, allData isn’t shared but duplicated, and that makes the use of SharedArray likely useless. Try to follow the examples.
  • Try to use the open function from the experimental fs package - see these docs. To do so, you mostly need to add one additional import: import { open } from 'k6/experimental/fs'; - see open.
  1. if i had skeleton payloads, would it be better to use k6 to update them then use subtle crypto for the message signing and do it in real time e.g. generate + sign the payloads, rather than reading from a file?

Note that large data shared across VUs is read-only, so unless that’s something you would do as part of each iteration, I think your current approach is better. Whether the crypto work should be part of each iteration or not really depends on your use case and scenario, so I cannot tell.

If that would prevent you from having to load large data files, then it might worth a try, because as I said above, k6 is not very good at handling large files.


I hope that helps!

Hi @joanlopez ,

thanks for you reply.

I have found that running K6 locally outside of the docker container and with no instances of google chrome running etc prevented any memory issues. However i will respond to your queries anyway

  1. I was running it locally in a docker container not in the cloud. For the specific error i did not look at the logs of the k6 instance in the docker container however after i ran the command to run the k6 script, it shows the progress bar of the initialization and it failed to complete and just stopped loading.

  2. Using sharedArray and experimental/fs open seems to have done the trick so far, needs further testing with larger files

  3. In each iteration I’d generate a unique payload using the in memory skeleton payloads, sign it and POST it to our API. I have an open forum ticket regarding this however it has been temporarily hidden by the bot until it has been reviewed (id: 139362?). But in it i discuss a pre-request script in postman (which signs the payloads perfectly) resulting in a 201 response from our API vs the k6 script which results in an invalid signature response from our API.

I think I’d love to explore 3) above however until it has been reviewed I can’t really progress with it.

Thanks for your help :slight_smile: