Baselining individual microservices

sstratton · February 2, 2021, 9:26pm

Hi, does anyone have any advice about using k6 for benchmarking individual microservices (ideally in a CI pipeline)?

Also any advice on using k6 as part of a chaos engineering toolset against a EKS cluster would be brilliant!

cesarcorrea · February 4, 2021, 2:33am

CI pipeline? ok, try to investigate about blue green testing. in order to make benchmarking.
chaos? look, I love k6 and I would try to give it a chance applying this same strategy that Gremlin did with jmeter. basically the strategy applies to another load testing tools: >>here

simme · February 4, 2021, 7:12am

does anyone have any advice about using k6 for benchmarking individual microservices (ideally in a CI pipeline)?

What are your requirements? Just benchmarking individual microservices is kinda vague. If you’re looking to get separate timings for each microservice, I might have some ideas, but it would definitely help with some more context.

Also any advice on using k6 as part of a chaos engineering toolset against a EKS cluster would be brilliant!

Did you have anything in particular in mind? For instance, a particular chaos engineering tool?

As long as you’re able to connect to the control plane via kubectl, something like xk6-chaos or chaostoolkit-k6 might be suitable for the task. Both are highly experimental though, so filing issues in the repos as you bump into issues will definitely be necessary (and highly appreciated).

If there is anything else I could assist you with reg. your chaos endeavours, just let me know.

Best
Simme

sstratton · February 5, 2021, 5:16pm

Hi @simme,

Thanks - I’ll try both those links to chaos engineering, I love the xk6-chaos extension, which is a really cool approach. The toolkit will be useful also.

With benchmarking microservices, I’m taking the approach that:

Devs can use benchmarking tools in their code for performance optimization at a unit test level. It might be possible to do this in CI also for certain languages (e.g. with Golang you can build binaries of your tests so you can compare changes) so it might be worth looking at this in future.
I’d like to be able to profile the CPU (in the CI pipeline for each service) then compare with a previous run to see if any new code adds significant CPU time. Google Cloud Profiler looks good for this. I don’t know if there are any other options?
I’d like to be able to watch for Memory Leaks in CI.
I’d like to be able to watch for any language-specific issues (e.g. goroutine leaks in Golang) in CI.
I’d like to be able to monitor latency in CI.

3-5 would need k6 running against the service (with mocks) in order to get results.

With 3-4 I’m not sure how to do the ‘comparison with a previous run’ bit. Maybe Cloud Profiler also… I haven’t tried this yet or found any other frameworks.

For 5 (latency/response times) I’m guessing:

Do some ‘baseline’ runs
Set up Thresholds in k6 based on those
Leave it be, the CI job will fail if things get worse << is there any danger that a randomly slow CI runner or cloud network will cause a false positive here?
(optionally) Export results to InfluxDB so we can capture historical results.

Does this seem like the best approach?

sstratton · February 5, 2021, 5:39pm

Ultimately I’d like the services in high-risk areas to be:

As fast as humanely possible
Not consume large amounts of CPU
Not leak memory
Not get any worse

And for the solution to:

Not be a bit overhead for the devs
Any problems to be picked up early (in CI)
To be easily interpreted by the devs and not need specialist knowledge

cesarcorrea · February 6, 2021, 3:47am

I want to express my thoughts
i m afraid that you will need to coordinate with dev ’ ops team, in order to start up and down instances on demand, by gitops or circleCI , weavework… etc…,
where each one should contains a respective code/feature version.
To profile the CPU, i think you will need to have a monitoring tool like a sidecar running along with the CI, wherein it can read CPU thresholds reached and then arise a flag.(which allows the job to reject the MERGE Request, PullRequest.

In all executions you will run, you ought to save / dump the metrics in influxDB to analyze the metrics by grafana after tests are done. (observability)

simme · February 8, 2021, 12:14pm

You could for instance use Prometheus in go to provide metrics from within the actual execution path of the code you’re testing. This would in turn allow you to work with the dev team to add metrics for internal timings, which could prove useful while benchmarking.

If you’re running GCP, using Google Cloud Profiler definitely makes sense. Worst case, you’d able to export boh RAM and CPU usage through the prometheus go client as well, although it takes a little bit more tinkering to set up.

I agree with what you’ve listed here. Maybe @nicole might be able to provide some additional insights from the perspective of good performance testing practices.

Yes. Adding a threshold for the runners CPU usage is a good way of staying on top of this. It will of course still fail the build, which I personally prefer to the alternative, but at least you’ll get a clear indicator of the cause, allowing you to just rerun the test.

nicole · February 9, 2021, 1:07pm

Sounds like a solid approach. I wouldn’t be too concerned about the job failing or getting a false positive. As long as you have ways to find out why it failed (such as by using monitoring metrics and thresholds in k6, as Simme suggested), a failure can still be useful information.

And on that note, I just wanted to add that I don’t think capturing some historical results is optional. It’s not always immediately obvious from just the last result that there’s a problem. You might ignore a failure once as an outlier, but if you later spot a pattern of failures (slow response times every month/quarter at a certain time), you’ll be glad you kept data for what you thought were “random” failures.