Http_req_duration wrong datadog

Hi guys, I am having an issue with http_req_duration. on the pipeline it shows the correct med but on datadog it shows a completely different number. any idea on how to get it right?

Hi @phaltoe,

I ran some tests outputting to Datadog, and wasn’t able to quite reproduce your situation, but I did notice some weirdness with the metrics in Datadog.

After a first test the values seemed to align:

But running subsequent tests it continued increasing, even though the med values I was seeing in k6’s CLI output stayed more-or-less the same (within a few ms). Currently my avg over the past 15 minutes is 140.80ms, which is higher than what k6 was showing me.

Digging into the Metrics Explorer shows strange spikes in the values, which I didn’t see in the local report:

It seems like some aggregation issue either from k6->DogStatsD or from DogStatsD->Datadog that ends up submitting higher values.

Can you share a screenshot of your k6.http_req_duration.median graph from the Metrics Explorer and confirm if you’re seeing the same behavior?

Also keep in mind that that value in the dashboard is an average over the time period you selected, which, depending on the range and your previous metrics could skew your expected result. So make sure that you’re viewing the correct time range.

1 Like

Hi @imiric and @phaltoe,

I was able to reproduce this issue. Please find the following screenshot,

For http_req_duration from k6 it is p(95)=1.58ms. But in the Datadog it shows as 1.66 ms

Reason: k6 uses DogStatsD to send data to Datadog. This implements the StatsD protocol with some extensions. DogStatsD aggregates multiple data points for each unique metric into a single data point over a period of time called the flush interval (ten seconds, by default) and sends it to the Datadog.

From the Datadog, when it comes to the visualization again it aggregates and computes all query values across the time window. This defaults to avg by but you can change the method to max by , min by , or sum by .

Due to this behavior, we are seeing a difference between the k6 results and Datadog visualized values.

Suggested Solution: We should be able to configure the Datadog agent to change the data flush interval.


From my understanding of this (and your explanation) the problem is that datadog-agent once aggregates and calculates p95 and then the thing showed in the above is the average of those p95 that were calculated over the selected period. Which while probably useful enough for most people will likely never be exactly equal to the p95 k6 calculates which is over the whole period and without any aggregation (for now, we do plan on aggregating as the current behaviour is pretty … unoptimized and in practice leaks memory for big runs).

Apparently, also there is an issue with the datadog-agent so that the flush interval is configurable but it looks like it’s going nowhere :man_shrugging: .

To be honest it’s likely going to be very hard to make datadog flush only once for the whole test or something like that and also I would argue it will be … not useful as it will generate just 1 point for the p95(from my understanding, I might be completely wrong) for the whole test which likely isn’t what you would want.

I would argue the behaviour @Dilshan_Fernando reported is within what I would consider reasonable and likely what you will “want” from a practical perspective. While the original report was about so completely different values that at least to me it seems both unrelated and definitely a problem somewhere and given that k6 does not aggregate in this cases … I don’t think it’s k6 fault :wink: … but I am also a k6 developer so :stuck_out_tongue:

Hopefully @phaltoe has found a solution or the reason behind and will report it :wink:

@mstoykov Thanks for the detail. I was able to reproduce this issue.

For http_req_duration from k6 it is p(95)=471.14ms. But in the Datadog it shows a totally different value. Please find the following screenshots of the Datadog dashboard.

Any idea to fix this issue?


@Dilshan_Fernando what values does it show if you click on the parts where the graph is lower.

Again the k6 numbers are for the whole of the test overall the requests. The datadog ones are for some period (few seconds). I would expect that in the places where the bumps are there are fewer requests but those take way longer like in there are 10 requests in that (let’s say) 5s interval that datadog is aggregating with some low (sub 1s) and some very high values 20s+. While in the places where the values are low there are 1000 requests all with fairly low values.
Looking at the graph and the fact the p50 and p95, max and avg coincide it seems like there was either only 1 request or very few ones that are mostly with the same times :man_shrugging:. Can you also graph the number of requests http_reqs(or I guess datadog will have count)?

Looking at the k6 output you seem to be using constant arrival rate and that you are more or less hitting the 1650 iterations (started) that you want and while you have some amount of them dropped I would expect more given the graph. I would recommend that you directly start with 4k VUs given these results and graph both the requests finished and dropped iterations to see whether when the numbers go up you actually stop doing requests as fast as you want.

Any updates on this old issue. I am having the same issue. Should this issue be logged with datadog ? If yes, where exactly do we log this datadog issue ?
I am seeing the following highlighted K6 metrics (http_req_duration.avg and http_req_failed count in percentage) in teamcity build log -

However , on datadog, it shows different values. Not sure where it is pulling the http_req_failed value in number which is totally unrelated to the percentage number above
k6.http_req_duration.avg → Avg = 59.1s. k6.http_req_failed → Avg = 4.76
I use following docker run command -
K6_STATSD_ENABLE_TAGS=true K6_STATSD_ADDR=cypress-dd-agent:8125 K6_STATSD_PUSH_INTERVAL=10ms k6 run…
I reduced pushinterval to 10 ms after reading this issue - Missing data with statsd output - #6 by mstoykov - OSS Support - Grafana Labs Community Forums
However I am stuck as to what else can be done to ensure that K6 build log metrics matches datadog metrics.
All my datadog graphs show a close but different number than K6 build logs.
Since no one is working on this issue, is migrating to grafana a better solution than trying to resolve this datadog issue.
Please guide or fix this issue as it is blocking our usage of K6 and Datadog.

A post was split to a new topic: Why do k6.http_req_duration.max and k6.http_req_duration.95percentile have the same value in Datadog?