Calculate the accurate size in bytes of a string in k6

Looks like k6 doesn’t provide a built-in way to measure byte size of a string in UTF-8 encoding directly like Buffer , it’s not able to provide an accurate size calculation. I had a workaround by assuming most characters with 1-2bytes, but it’s not precise especially in the scenario where there are characters from different languages or emojis.
Anyone know if there is a better way?

My use case is that I need to use k6 to bulk insert data to Salesforce. Because the csv file is extremely large that Salesforce Bulk API is not able to handle it once for all, I have to chunk it where I need calculation to decide when the script should chunk the data.

Hi @zhaozhang7281, welcome to the community forum!

ECMAScript strings by specification are UTF16 - so all characters take 2 bytes.

But in transmission as in sending it over the network we do usually use utf-8 (I actually don’t remember the time this wasn’t the practically only thing).

There is a living standard for encoding strings to and from utf-8 that is implemented in browsers, but not in k6.

Luckily there is a polyfill that works fairly okay:

import { TextEncoder } from "https://raw.githubusercontent.com/inexorabletash/text-encoding/master/index.js"

let te = new TextEncoder()

console.log(te.encode("someString").byteLength)

console.log(te.encode("Нещо на Български").byteLength)

export default function() {} // just to not make k6 error

But to be hoenst in your case it will probably be better to just decide that each character will take 2-3 bytes due the calculation based on that and continue.

That will be way faster on the computation part and will also likely be make you send 1 or 2 more packages.

But better try with your data and see how it is.

Hope this helps you!