Noisy error logs in distributor component

I am seeing some error logs with distributors with the v1.1.0-rc release. I see that all ingesters are active and in healthy state and there are no errors in ingester logs corresponding to this.

level=error ts=2021-08-23T10:46:10.157097875Z caller=log.go:27 msg=“pusher failed to consume trace data” err=“rpc error: code = Unavailable desc = transport is closing”
level=warn ts=2021-08-23T10:46:09.836463272Z caller=pool.go:185 msg=“removing distributor_pool failing healthcheck” addr=10.58.21.174:9095 reason=“rpc error: code = DeadlineExceeded desc = context deadline exceeded”
ts=2021-08-23T10:46:05.813528284Z caller=memberlist_logger.go:74 level=error msg=“Failed fallback ping: read tcp 10.58.14.31:42490->10.58.44.61:7946: i/o timeout”
level=warn ts=2021-08-23T10:46:02.153600971Z caller=pool.go:185 msg=“removing distributor_pool failing healthcheck” addr=10.58.26.235:9095 reason=“rpc error: code = DeadlineExceeded desc = context deadline exceeded”
level=error ts=2021-08-23T10:45:55.258558135Z caller=log.go:27 msg=“pusher failed to consume trace data” err=“rpc error: code = Unavailable desc = transport is closing”
level=warn ts=2021-08-23T10:45:55.048372224Z caller=pool.go:185 msg=“removing distributor_pool failing healthcheck” addr=10.58.26.41:9095 reason=“rpc error: code = DeadlineExceeded desc = context deadline exceeded”
level=error ts=2021-08-23T10:45:35.558405519Z caller=log.go:27 msg=“pusher failed to consume trace data” err=“rpc error: code = Unavailable desc = transport is closing”
level=error ts=2021-08-23T10:46:10.157097875Z caller=log.go:27 msg=“pusher failed to consume trace data” err=“rpc error: code = Unavailable desc = transport is closing”
level=warn ts=2021-08-23T10:46:09.836463272Z caller=pool.go:185 msg=“removing distributor_pool failing healthcheck” addr=10.58.21.174:9095 reason=“rpc error: code = DeadlineExceeded desc = context deadline exceeded”
ts=2021-08-23T10:46:05.813528284Z caller=memberlist_logger.go:74 level=error msg=“Failed fallback ping: read tcp 10.58.14.31:42490->10.58.44.61:7946: i/o timeout”
level=warn ts=2021-08-23T10:46:02.153600971Z caller=pool.go:185 msg=“removing distributor_pool failing healthcheck” addr=10.58.26.235:9095 reason=“rpc error: code = DeadlineExceeded desc = context deadline exceeded”
level=error ts=2021-08-23T10:45:55.258558135Z caller=log.go:27 msg=“pusher failed to consume trace data” err=“rpc error: code = Unavailable desc = transport is closing”
level=warn ts=2021-08-23T10:45:55.048372224Z caller=pool.go:185 msg=“removing distributor_pool failing healthcheck” addr=10.58.26.41:9095 reason=“rpc error: code = DeadlineExceeded desc = context deadline exceeded”
level=error ts=2021-08-23T10:45:35.558405519Z caller=log.go:27 msg=“pusher failed to consume trace data” err=“rpc error: code = Unavailable desc = transport is closing”

Are things generally working but you’re still getting that error? or is nothing working?

If it’s the former I’d look at this metric to make sure that pushes aren’t taking too long:

histogram_quantile(.99, sum(rate(tempo_request_duration_seconds_bucket{route=~"/tempopb.Pusher/Push.*"}[$__rate_interval])) by (le))

If it’s the latter I’d debug the network connection between distributors and ingesters.

We see this message logged as well, but only when rolling out a new set of ingesters.