Hi, does anyone have any advice about using k6 for benchmarking individual microservices (ideally in a CI pipeline)?
Also any advice on using k6 as part of a chaos engineering toolset against a EKS cluster would be brilliant!
Hi, does anyone have any advice about using k6 for benchmarking individual microservices (ideally in a CI pipeline)?
Also any advice on using k6 as part of a chaos engineering toolset against a EKS cluster would be brilliant!
CI pipeline? ok, try to investigate about blue green testing. in order to make benchmarking.
chaos? look, I love k6 and I would try to give it a chance applying this same strategy that Gremlin did with jmeter. basically the strategy applies to another load testing tools: >>here
Hi @sstratton,
does anyone have any advice about using k6 for benchmarking individual microservices (ideally in a CI pipeline)?
What are your requirements? Just benchmarking individual microservices is kinda vague. If youāre looking to get separate timings for each microservice, I might have some ideas, but it would definitely help with some more context.
Also any advice on using k6 as part of a chaos engineering toolset against a EKS cluster would be brilliant!
Did you have anything in particular in mind? For instance, a particular chaos engineering tool?
As long as youāre able to connect to the control plane via kubectl, something like xk6-chaos or chaostoolkit-k6 might be suitable for the task. Both are highly experimental though, so filing issues in the repos as you bump into issues will definitely be necessary (and highly appreciated).
If there is anything else I could assist you with reg. your chaos endeavours, just let me know.
Best
Simme
Hi @simme,
Thanks - Iāll try both those links to chaos engineering, I love the xk6-chaos
extension, which is a really cool approach. The toolkit will be useful also.
With benchmarking microservices, Iām taking the approach that:
3-5 would need k6 running against the service (with mocks) in order to get results.
With 3-4 Iām not sure how to do the ācomparison with a previous runā bit. Maybe Cloud Profiler alsoā¦ I havenāt tried this yet or found any other frameworks.
For 5 (latency/response times) Iām guessing:
Does this seem like the best approach?
Ultimately Iād like the services in high-risk areas to be:
And for the solution to:
I want to express my thoughts
i m afraid that you will need to coordinate with dev ā ops team, in order to start up and down instances on demand, by gitops or circleCI , weaveworkā¦ etcā¦,
where each one should contains a respective code/feature version.
To profile the CPU, i think you will need to have a monitoring tool like a sidecar running along with the CI, wherein it can read CPU thresholds reached and then arise a flag.(which allows the job to reject the MERGE Request, PullRequest.
In all executions you will run, you ought to save / dump the metrics in influxDB to analyze the metrics by grafana after tests are done. (observability)
You could for instance use Prometheus in go to provide metrics from within the actual execution path of the code youāre testing. This would in turn allow you to work with the dev team to add metrics for internal timings, which could prove useful while benchmarking.
If youāre running GCP, using Google Cloud Profiler definitely makes sense. Worst case, youād able to export boh RAM and CPU usage through the prometheus go client as well, although it takes a little bit more tinkering to set up.
I agree with what youāve listed here. Maybe @nicole might be able to provide some additional insights from the perspective of good performance testing practices.
Yes. Adding a threshold for the runners CPU usage is a good way of staying on top of this. It will of course still fail the build, which I personally prefer to the alternative, but at least youāll get a clear indicator of the cause, allowing you to just rerun the test.
Sounds like a solid approach. I wouldnāt be too concerned about the job failing or getting a false positive. As long as you have ways to find out why it failed (such as by using monitoring metrics and thresholds in k6, as Simme suggested), a failure can still be useful information.
And on that note, I just wanted to add that I donāt think capturing some historical results is optional. Itās not always immediately obvious from just the last result that thereās a problem. You might ignore a failure once as an outlier, but if you later spot a pattern of failures (slow response times every month/quarter at a certain time), youāll be glad you kept data for what you thought were ārandomā failures.