Share of date file by all Vu's

The code should’ve been (but isn’t because of technical limitation):

const maxVUs = 200;
var data;
if (typeof __VU === "undefined") { // there is an execution of the init context which is just so we know what files will be needed
  var p = open("data.json");
} else { // we have __VU
  data = function() {
    var rawData = JSON.parse(open("data.json")); // we read and parse data.json which is just a big array
    let partSize = Math.floor(rawData.length / maxVUs); // we get in how many steps we have to divide it so it is even (maybe use ceil instead of floor as this will possibly miss some values ... but with floor there will be overlap ...)
    return  rawData.slice(partSize*__VU, partSize*__VU+partSize); // we get only the parts for that VU
  }
}
// do stuff with data

Unfortunately … __VU is not defined in the init context even when we are actually in a VU which IMO is a bug but as previously stated there are other priorities currently and they will have effect on this so we will fix it when #1007 is merged :).
So we need to come up with some random number and this is what I propose

const maxVUs = 200;
// we don't check for __VU as it is never defined
var data = function() {
    var rawData = JSON.parse(open("data.json")); // we read and parse data.json which is just a big array
    let partSize = Math.floor(partSize.length / maxVUs); // we get in how many steps we have to divide it so it is even (maybe use ceil instead of floor as this will possibly miss some values ... but with floor there will be overlap ...)
    let __VU = Math.floor(Math.random() * maxVUs)  // just get a random VU number
    return  rawData.slice(partSize*__VU, partSize*__VU+partSize); // we get only the parts for that VU
  }
// do stuff with data

In both cases maxVUs needs to be defined by you as well and I would propose that given that only the second example currently works I would recommend that if you have 200 VUs on a machine to set maxVUs to something like 20 so every VU gets 1/20 of the raw data. Obviously in this case, maxVUs is … not correctly named so maybe rename it to dataParts?

If you are going to separate between 4 machines and if this is applicable you can also divide the data into 4 parts between the machines :wink:

Something that I didn’t mention as it usually less of a problem when you have big data arrays that need to be loaded is that from k6 v0.26.0 there is compatibility mode option for k6 which will disable some syntax and niceties but also lowers the memory usage … significantly, for scripts that don’t use that much data.

Our benchmarks show a considerable drop in memory usage - around 80% for simple scripts, and around 50% in the case of 2MB script with a lot of static data in it.

Hope this helps you :wink: