Cloud VU Calculator

PlayStay · January 26, 2024, 5:36pm

Hey folks. Based on this page - Cloud IPs I’m trying to figure out a way to calculate the most efficient use of VU’s for the arrival-rate executor. As you know it is great for maintaining throughput under variable latency buy dynamically adjusting VUs throughout a test scenario. Love it However it’s expensive on VUh usage. I’m zeroing in on a crude way to calculate what to set for maxVUs when using arrival-rate and ramping-arrival-rate based on the response time of my service under test but it’s proving to be a brittle solution.

my biggest problem is how k6 decides which tier to use based on VU’s. I’m finding that given variables such as LZ count, think time, response time, VUs if I get it wrong I’ll max out CPU on the instance(s) in the LZ, If I kinda get it right one scenario works fine but another runs out of VUs and truncates load from reaching the target rate. the only time I get a clean run is when I give up and just add 1 VU per transaction rate. again expensive.

again any thoughts on how to optimize usage of Load generation instances within and LZ using ramping-arrival-rate? I’d like to take my particular 2 known values of target rate and expected response time to determine the minimum count to set for maxVUs on any give test.

here’s an example of my scenario(s). in this case I’m trying to drive no more than 3000 RPS from any given load zone. ultimately I want to run a test generating 30K (3k per LZ) regardless of the testing use case.


    noConnectionReuse: options_trigger_true_false,
    noVUConnectionReuse: options_trigger_true_false,
    thresholds: {
        // we can set different thresholds for the different scenarios because
        // of the extra metric tags we set!
        'http_req_duration{test_type:peak}': [{ threshold: 'med<200', abortOnFail: latency_trigger_true_false, delayAbortEval: '180s' }],
        'http_req_duration{test_type:gate_rush}': [{ threshold: 'med<400', abortOnFail: latency_trigger_true_false, delayAbortEval: '180s' }],
        // we can reference the scenario names as well
        'http_req_failed{scenario:peak}': [{ threshold: 'rate < 0.05', abortOnFail: error_trigger_true_false, delayAbortEval: '180s' }],
        'http_req_failed{scenario:gate_rush}': [{ threshold: 'rate < 0.05', abortOnFail: error_trigger_true_false, delayAbortEval: '180s' }],
        'vus_max': [{ threshold: `value < ${maxvu_allow}`, abortOnFail: vu_trigger_true_false, delayAbortEval: '180s' }],
    },
    discardResponseBodies: false,
    summaryTrendStats: ['avg', 'min', 'max', 'p(95)', 'p(99)'],
    insecureSkipTLSVerify: error_trigger_true_false,
    ext: {
        loadimpact: {
            distribution: {
                ashburnDistribution1: { loadZone: 'amazon:us:ashburn', percent: 100 },
                // ashburnDistribution2: { loadZone: 'amazon:us:ashburn', percent: 50 },
             /*    // dublinDistribution: { loadZone: 'amazon:ie:dublin', percent: 10},
                // capeTownDistribution: { loadZone: 'amazon:sa:cape town', percent: 10},
                // hongKongDistribution: { loadZone: 'amazon:cn:hong kong', percent: 10},
                // mumbaiDistribution: { loadZone: 'amazon:in:mumbai', percent: 10},
                osakaDistribution: { loadZone: 'amazon:jp:osaka', percent: 10},
                seoulDistribution: { loadZone: 'amazon:kr:seoul', percent: 10},
                singaporeDistribution: { loadZone: 'amazon:sg:singapore', percent: 10},
                sydneyDistribution: { loadZone: 'amazon:au:sydney', percent: 10},
                tokyoDistribution: { loadZone: 'amazon:jp:tokyo', percent: 10},
                // montrealDistribution: { loadZone: 'amazon:ca:montreal', percent: 10},
                frankfurtDistribution: { loadZone: 'amazon:de:frankfurt', percent: 10},
                londonDistribution: { loadZone: 'amazon:gb:london', percent: 10},
                // milanDistribution: { loadZone: 'amazon:it:milan', percent: 10},
                // parisDistribution: { loadZone: 'amazon:fr:paris', percent: 10},
                // stockholmDistribution: { loadZone: 'amazon:se:stockholm', percent: 10},
                // bahrainDistribution: { loadZone: 'amazon:bh:bahrain', percent: 10},
                saoPauloDistribution: { loadZone: 'amazon:br:sao paulo', percent: 10},
                // paloAltoDistribution: { loadZone: 'amazon:us:palo alto', percent: 10},
                portlandDistribution: { loadZone: 'amazon:us:portland', percent: 10 }, */
            },
            projectID: nnnnnn,
            name: 'Accounts Smoke Test'
        }
    }
};

sorry forgot to add that the target rates for these 2 scenario’s are different. for instance.
peak target - 1000 RPS
gate_rush - 3000 RPS

By my observations for my test conditions 1K RPS should be possible with 2-300 VU spread amongst the LZ instance(s) and IP(s). as well the 3K should be possible with a capacity of 1k VUs. However what I’m seeing is that at 3K RPS the single m5.large instance maxes out CPU on a number of the LG’s (load generator IPs). when I add a second Ashburn LZ the CPU issue resolves but having to use 2 LZ in the same region is kind of a pain. I could change thinktime/pacing but now I’m playing whackamole…

thanks in advance,

PlayStay

PlayStay · February 6, 2024, 3:07am

Well I thought I’d post my kludgy hack to find a way to calculate the least amount of VU that should support a given api call…

essentially i’m taking the range of response times (low and high) for a given api as my bookends for vu for my two scenario workload model. what I call peak and gate_rush

api avg response - 10ms
api p99 response - 70ms

1 request takes 10ms on avg or 1/0.01 = 100 request/s per VU - best case
conversely the worst case p99 ~ 15 req/s per VU - worst case
if the RPS for peak is 10k RPS for best case scenario 10k RPS/100 req/s/VU or 100 VU should support the 10k throughput
I took the worst case value or p99 and 15k / 15 req/s/VU or 1000Vu should support 15K RPS.

the 10k and 15k target rates are arbitrary but again it works for the most part

this is crude and it fails at load rates so I set an artificial minimum VU of 50 as you can see below. However it seems to be a plausible hack for making sure I have the minimum viable amount of VUs for a given api under my scenario workload.

what do you think? any better ways to do this come to mind from anyone?

export let srv_entity_peak_vumax =  Math.max(50, Math.round(srv_entity_peak / (1/srv_entity_pacing_low_ext)))
export let srv_entity_gate_rush_vumax =  Math.max(50, Math.round(srv_entity_gate_rush / (1/srv_entity_pacing_high_ext)))

codebien · February 8, 2024, 8:10am

Hey @PlayStay,
if you are a Cloud user I encourage you to directly open a support ticket for k6 Cloud to get a faster reaction.

In the linked comment below you can find how to do it.

PlayStay · February 8, 2024, 7:06pm

I’m a dork. I should’ve posted this in OSS. I’m using this in both cases. OSS and Cloud. I didn’t want to bother “official” support.

For something like this my thinking was the community would be of more help with a creative solution . No offense to the support team but we use this everyday and maybe there is an undocumented super cool solution to VU hacking (calculation).