Explaining client errors (1050/1211/122)

jhwj9617 · August 24, 2023, 6:59pm

Explaining client errors:

1050 - request timeout
1211 - dial i/o timeout
1220 - read: connection reset by peer

As part of our service health evaluation tool, we look at the http responses of requests. But we don’t know how to handle response 0 (indicating client error)

The most common ones we encounter are the ones listed above. How can we further understand whether its a service-side issue, a proxy/gateway issue or a client issue? Our load agents aren’t completely optimized such as tcp re-use, increasing http ports etc, but we don’t know whether this is the limitation or something else.

I’m looking for some info on how to interpret these errors better.

olegbespalov · August 28, 2023, 3:40pm

Hi @jhwj9617 !

We have the documentation page where we explain these codes there is there Error Codes.

Regarding the response 0, is there a case where the Response.error_code comes with 0?

Cheers!

jhwj9617 · August 28, 2023, 4:14pm

I’m familiar with that page. The ‘0’ I’m referring to is the http status which is 0, which gives a separate client error code (1050/1211/1220)

My question pertains to whether the errors are a result of service/network bottleneck, or lack of optimization on client agent (e.g. tcp re-use, starved ports). Are there separate errors for those? Or is it ambiguous if it is client issue or not.

olegbespalov · August 29, 2023, 2:15pm

My question pertains to whether the errors are a result of service/network bottleneck, or lack of optimization on client agent (e.g. tcp re-use, starved ports). Are there separate errors for those? Or is it ambiguous if it is client issue or not.

Unfortunately, it could be both I mean the status could be determined only when we got something from the server, before that it’s defaulted 0. But the error could be on the client side or still some network error. I’d say that in that case the Response.error_code should be more explicit.

Hope that answers
Cheers!

jhwj9617 · August 29, 2023, 6:12pm

@olegbespalov We have a Health evaluator tool which reads these metrics.

Right now we don’t have confidence if its a genuine service-side issue, or if its client side. We don’t want to flag false positives.

If we don’t have optimized clients, I’m afraid these client errors are more pronounced. For now I’m thinking we filter out client errors to avoid false-positives. And then when we have confidence that client side is rarely the issue, if at all, then we can remove the filter and count ‘0’ status as failures as part of health evaluation.

The main point is that, when evaluating service-health, we are mostly only interested in service-side health, and not client issues.

What do you think? What is the standard practice within the community?

olegbespalov · August 31, 2023, 12:16pm

Hey @jhwj9617

As I said, the status 0 could mean different things. It could be that there is a misconfiguration of the resources in infra or not optimized load generator (e.g. Running large tests)

I’d still recommend checking what is inside the Response.error_code and taking some actions based on that knowledge. Maybe you even could emit a custom metric based on the Response.error_code and monitor it

Also, to understand your case better. Are these status 0 responses happening when trying to reach some RPS? And what’s the percentage of them?

Cheers!

jhwj9617 · August 31, 2023, 7:58pm

Hi @olegbespalov

We do track the Response.error_code, and the most common are the ones I listed:

1050 - request timeout
1211 - dial i/o timeout
1220 - read: connection reset by peer

That’s what I’m trying to get at. For these cases, do I need more information or can I make judgements about our system (client/network/service) based on it.

They come up intermittently. About <1% of the time, but often enough that it affects our target SLOs (99.95%). They show up both at low (~10RPS) and at high RPS (~1000 RPS)

jhwj9617 · September 6, 2023, 8:43pm

@olegbespalov

Any info on how to interpret these error codes better?

olegbespalov · September 7, 2023, 6:10am

Hi @jhwj9617

To be honest, both 10 RPS and 1000 RPS don’t sound high, so it’s highly likely the issue is with the subject under test (or in between), and it desires a better investigation.

1050 is the request timeout, Basically, k6 did a request, but there was no response in a defined timeout (by default for HTTP it’s 60s)
1211 k6 wasn’t even able to make an HTTP request. The target system can’t establish a TCP connection.
1220 This is caused by the target system resetting the TCP connection. It happens when the Load Balancer or the server itself isn’t able to handle the traffic.

So in theory, you could try to decrease the 1050, by increasing the timeout setting. However, it is worth also looking into the metrics or logs of the LB to see if there are any suspicious things.

Cheers!

Topic		Replies	Views
K6 '0' response code and given request didn't reach server OSS Support	4	3971	November 16, 2022
Help with troubleshooting K6 timeout errors OSS Support	9	3831	October 31, 2022
Status Code 0 k6 error OSS Support	0	342	July 5, 2023
K6 - Error - HTTP status code 0 OSS Support	5	2550	April 20, 2023
First time load testing, some basic queries OSS Support	1	741	July 24, 2020

Explaining client errors (1050/1211/122)

Related topics