Load testing - spike vs stress to determine performance

During the course of my load testing journey, I’m doing mostly spike test on the SUT(System Under Test). We gradually dial up the number of users hitting the server, from 10, 100, 1000. And we log the process execution time separately from the response time, and these are the kind of results we obtain say for a spike of 1000 users, at a single point of time:

export const options: any = {
  scenarios: {
    [Bu]: {
      executor: 'per-vu-iterations',
      exec: 'mRun',
      vus: 1000,
      iterations: Iteration,
      maxDuration: '1000m',
      tags: { tag_index: Index },
      env: {
        bu: Bu,
export function mRun(){
  // code to log res.timings.duration
response time: 0.1s   api execution time: 0.3s
response time 0.11s   api execution time: 0.4s
response time 0.15s   api execution time: 0.3s
response time 0.7s    api execution time: 0.5s
response time 0.9s    api execution time: 0.3s

... 500th request processed by the server
response time: 7s       api execution time: 0.6s
response time: 7.1s     api execution time: 0.5s
response time: 7.2s     api execution time: 0.4s

... 1000th request processed by the server
response time: 11s        api execution time: 0.6s
response time: 11.1s      api execution time: 0.5s

We speculate the increasing response time due to the queuing up of requests on the server.
As we see in the observations above, the execution time fluctuates around the same amount of time, and it’s this queuing up of request that’s giving us a seemingly bad response time.

So, what I’m confused on is if a spike test is actually a performance test. To me it seems more like a concurrency test in which we’re trying to see if a system has some rate of failure and logging response, seems to me, doesn’t have more value than a SUT on continuous expected load. These are just my ideas and I’d like more views on this point.

So, if you’d log the response times for a SUT, on a spike test and a stress test, how would you interpret those two different sets of data? And which test/test data would be better suitable to speculate and determine and benchmark the response times for a SUT?

Hi there!

Have you read our “Stress testing” guide? I think you’ll find it useful to answer some of your questions.

But to summarize: you can use both stress and spike testing to measure performance of your system. The difference is in the behavior that you want to model with your tests. A stress test will gradually increase the load to reach your SUT’s breaking point. While a spike test will suddenly increase the load to overwhelm the SUT. You can keep track of the response time and any other metrics in both scenarios, but they would tell you different things in each case.

For example, during a stress test, you first might observe gradual response time increases, then HTTP 500 errors, and finally TCP connection errors (i/o timeout, connection reset by peer, etc.). Depending on how your SUT is deployed, these could indicate problems with your autoscaler, a bottleneck in your backend service, or in your load balancer. Since the test is typically longer than a spike test, the focus would be on how well the system handles sustained load at each step, and could point you in different directions to optimize and scale the system.

A spike test, in turn, models the behavior of quick load bursts, so it might start showing you TCP connection errors right away. This would allow you to see more quickly how, and if, the system recovers after the load subsides.

In both cases it’s important that the system does recover, but the focus of your metric tracking will be different.

Hope this clears some things up. In general, you don’t need to focus too much on the textbook definitions of each test type. After all, you can combine a stress test that has sudden spikes if that’s what you need. The important thing is to observe the effects that each load change has on your system, so you can see which part you need to optimize.


@imiric , so when I spike a SUT with 500 request in 1 second, and k6 gives me a response time of 150sec , that doesn’t mean I should deduce the avg response time to be 150s right? So, with this test, I am able to just determine if the system recovers right?

Also, I also have very conceptual difficulty in what would the average response time of an endpoint be, because that would be load dependent right? on minimal load the SUT might give good response time, on high load it might be bad. What would be a way to benchmark the EUT(Endpoint under test :wink: )

Is that 150s response time for a single request, or are you averaging over 1s? Judging by your original post, you seem to be logging each res.timings.duration. May I ask why? If you want to see the metrics for each request, you can use the CSV or JSON output. If you want to see aggregated results, you can use the end-of-test summary, or send the metrics to any output of your choice, and aggregate them there.

In general averages are a poor way of judging performance, since you’ll miss any outliers. Instead you should focus on percentiles, specifically p(95), or even higher, p(99), etc. See the --summary-trend-stats option.

The response time of your service really depends on what you want it to be. :slight_smile: If every millisecond counts for your product, then 200ms might be unacceptable, whereas if you’re running something like a personal blog, then that result might be perfectly fine. The point of load testing is to measure how your service behaves under different scenarios, so you can optimize the performance before it affects your users, and thus your product. You need to decide what that performance threshold should be, and how much you want to improve your result.

What would be a way to benchmark the EUT(Endpoint under test)

You can set thresholds per URL, or aggregate metrics manually per URL. Please go through the documentation and see some examples to get a better idea of what’s possible.