K6 cannot execute js scripts if the csv file is too large

My local machine is 8c16G,if loaded JSON file to execute script, it consumed about 10 GB of memory. CPU load average is not very high.


if did not load JSON file to execute script, it consumed about 0.5GB of memory, and the progress bar ===> moving normally.


This all seems to say that the system you are testing can not handle the 10k QPS you want it to.

if did not load JSON file to execute script, it consumed about 0.5GB of memory, and the progress bar ===> moving normally.

You haven’t specified this, but given the output it seems that you just configured it to run for 5 minutes with 100 VUs and no particular number of iterations - so it did just that.

As such the progress bar is following the time (5 minutes) and not the number of iterations k6 was told to do.

In practice that seems to have done a bit less iterations over all than the run with the JSON loading.

And again this is a lot less than 1/4th of the 10K QPS you want so even without the huge JSON you can see that the system under test can’t handle the required load. The exact thing I told you to test first before trying to run with 10m records in a JSON :wink:

I guess now it is time to figure out how to make your system faster, good luck and you can always come back for more k6 questions.

Thanks a lot @mstoykov , under your guidance, I figured out the problem. Here again, thank you for your efforts.

The 10k QPS is not actually for the API I’m currently testing, it’s another API’s performance requirement. I’m just using this API to do experiments.

At last, I have one more question consult you, the solution that @PlayStay mentioned above about folks have split their datafiles, can you provide some examples for reference? Later, I’ll consider whether to optimize my k6 script.

At last, I have one more question consult you, the solution that @PlayStay mentioned above about folks have split their datafiles, can you provide some examples for reference? Later, I’ll consider whether to optimize my k6 script.

I would guess @PlayStay was referencing the second of the old workarounds, the one that is presented with multiple CSV files.
With this script:

import { SharedArray } from 'k6/data';
import {sleep } from "k6";

const dataFiles = [
  './data_1.json',
  './data_2.json',
  './data_3.json',
  './data_4.json',
  './data_5.json',
  './data_6.json',
  './data_7.json',
  './data_8.json',
  './data_9.json',
  './data_10.json',
];

let data;

if (__VU == 0) {
  // workaround to collect all files for the cloud execution or archives
  for (let i = 0; i < dataFiles.length; i++) {
      let dataFile = dataFiles[i];
      new SharedArray(dataFile, ()=>{ return JSON.parse(open(dataFile))});
  }
} else {
  let dataFile = dataFiles[__VU % dataFiles.length]
  data = new SharedArray(dataFile, ()=>{ return JSON.parse(open(dataFile))});
}

export default function () {
  const user = data[0];
  sleep(1)
}

I got <3GB of starting memory usage, but it still took a minute to start the test.

I would argue this would matter if you don’t have enough memory, but otherwise it just makes it so that you need to split the files and then use the correct one in each VU. Not exactly the hardest thing, but arguably not really something you need to do if it isn’t needed.

Hi @sunnini here’s an example of a scenario I use.

export let options = {
    scenarios: {
        peak: {
            // peak scenario name
            executor: 'ramping-arrival-rate',
            startRate: 0,
            timeUnit: '1s',
            preAllocatedVUs: 50,
            maxVUs: 20000,
            stages: [
                { target: peak, duration: peak_ramp },
                { target: peak, duration: peak_sustain },
                { target: 0, duration: ramp_down },
            ],
            gracefulStop: after_peak_delay, // do not wait for iterations to finish in the end
            tags: { test_type: 'peak' }, // extra tags for the metrics generated by this scenario
            exec: 'peak_gate_rush', // the function this scenario will execute
        },

This produces the desired load profile and keeps throughput constant marvelously. there are some drawbacks when the system under test becomes latent which is very expensive on VU usage, but that’s another story.

here’s the example of how I solved my memory issues and it does not require all files to be loaded at script startup. As I described in the procedure above I create 25 datafile which are less than 25MB and contain 100K records. I append a number to the partitioned data files then select a file based on the mod operator. this way each time an iteration occurs in the k6 script only 1 file of manageable size is loaded. I forget why I chose 15000 but it works for me :smiley:. In my prod environment we use the million records partitioned into 25 datafile (yes we load test in production at scale). for staging and non-prod we only use 1 datafile so I hard code the number selector to “0”, hence the case/switch statement.

switch (deploymentEnvironment) {
    case "prod":
        var count = Math.floor(Math.random() * 15000) % 25;
    case "staging":
        var count = 0;
}

const accounts = new SharedArray('accounts', function () {
            return JSON.parse(open(data_file + count)).key;
});

seed data example with 1 record.

{
  "key": [
    {
      "ACCOUNT_ID": "accountID",
    } ]
}

Hope this helps.

2 Likes

Hello @mstoykov @PlayStay Sorry for the late reply, with your help, I solved the problem successfully. Thanks very much again.

2 Likes