The code should’ve been (but isn’t because of technical limitation):
const maxVUs = 200;
var data;
if (typeof __VU === "undefined") { // there is an execution of the init context which is just so we know what files will be needed
var p = open("data.json");
} else { // we have __VU
data = function() {
var rawData = JSON.parse(open("data.json")); // we read and parse data.json which is just a big array
let partSize = Math.floor(rawData.length / maxVUs); // we get in how many steps we have to divide it so it is even (maybe use ceil instead of floor as this will possibly miss some values ... but with floor there will be overlap ...)
return rawData.slice(partSize*__VU, partSize*__VU+partSize); // we get only the parts for that VU
}
}
// do stuff with data
Unfortunately … __VU
is not defined in the init context even when we are actually in a VU which IMO is a bug but as previously stated there are other priorities currently and they will have effect on this so we will fix it when #1007 is merged :).
So we need to come up with some random number and this is what I propose
const maxVUs = 200;
// we don't check for __VU as it is never defined
var data = function() {
var rawData = JSON.parse(open("data.json")); // we read and parse data.json which is just a big array
let partSize = Math.floor(partSize.length / maxVUs); // we get in how many steps we have to divide it so it is even (maybe use ceil instead of floor as this will possibly miss some values ... but with floor there will be overlap ...)
let __VU = Math.floor(Math.random() * maxVUs) // just get a random VU number
return rawData.slice(partSize*__VU, partSize*__VU+partSize); // we get only the parts for that VU
}
// do stuff with data
In both cases maxVUs
needs to be defined by you as well and I would propose that given that only the second example currently works I would recommend that if you have 200 VUs on a machine to set maxVUs
to something like 20
so every VU gets 1/20 of the raw data. Obviously in this case, maxVUs
is … not correctly named so maybe rename it to dataParts
?
If you are going to separate between 4 machines and if this is applicable you can also divide the data into 4 parts between the machines
Something that I didn’t mention as it usually less of a problem when you have big data arrays that need to be loaded is that from k6 v0.26.0 there is compatibility mode option for k6 which will disable some syntax and niceties but also lowers the memory usage … significantly, for scripts that don’t use that much data.
Our benchmarks show a considerable drop in memory usage - around 80% for simple scripts, and around 50% in the case of 2MB script with a lot of static data in it.
Hope this helps you