Consistent "broken pipe" error from vmselect when executing PromQL with a longer time range from Grafana.

pcperera · March 23, 2022, 6:36pm

Issue
“Broken pipe” error is thrown when executing PromQL (from Grafana) with a longer time range. This broken pipe occurs between Grafana and vmselect. This usually occurs between 3-5s. Grafana appears to terminate the network connection while vmselect attempts to write results to the channel.

To Reproduce
Invoke below from Grafana.

sum by (kubernetes_cluster,envoy_cluster_name) (increase(envoy_cluster_upstream_rq{envoy_response_code=~"(5..|429)"}[100d])) > 10.

Expected behavior
PromQL returns with results.

Actual behavior
PromQL times out between 3-5s (ref below screenshot).

Logs

vmselect

severity: "ERROR" textPayload: "2022-03-16T13:36:10.446Z warn VictoriaMetrics/app/vmselect/main.go:563 error in "/select/0/prometheus/api/v1/query_range?end=1647437400&query=envoy_cluster_upstream_rq%5B7776000s%5D&start=1639661400&step=3600": error when executing query="envoy_cluster_upstream_rq[7776000s]" on the time range (start=1639661400000, end=1647437400000, step=3600000): cannot send query range response to remote client: cannot send 2 bytes to client: write tcp4 10.0.0.24:8481->10.0.7.148:38124: write: broken pipe" timestamp: "2022-03-16T13:36:10.446458671Z"

Screenshots

Version
vmselect:v1.72.0
Grafana version v8.3.3 (30bb7a93ca)

Parameters

search.maxQueryDuration = 30s [VM]
grafana.client.timeout = 20s [Grafana Helm]
GrafanaDataSource jsonData:timeInterval = 25s (below CRD).

apiVersion: integreatly.org/v1alpha1 kind: GrafanaDataSource metadata: name: cluster-victoriametrics spec: name: victoriametrics.yaml datasources: - name: Prometheus type: prometheus access: proxy url: http://namespace.cluster:<port>/select/0/prometheus/ isDefault: true version: 1 editable: false jsonData: tlsSkipVerify: true timeInterval: "25s"

Question
I expected GrafanaDataSource jsonData:timeInterval of 25s (above) to be relevant for the vmselect invocation. But it seems Grafana times out within 3-5s. What is the correct timeout to set on the Grafana side for VictoriaMetrics PromQL queries. ?

pcperera · March 25, 2022, 6:31am

Appreciate any help on this.

hagen · March 26, 2022, 6:34pm

Hi @pcperera! Do you see the same error when using curl or vmui for executing this query?

pcperera · March 30, 2022, 11:53am

Hi @hagen

Thanks for the response.

No, vmui query is successful repeatedly.

pcperera · March 30, 2022, 5:21pm

@hagen and all,

Doesn’t the timeInterval govern the data source timeout ? Grafana query times out in 3-5s.

apiVersion: integreatly.org/v1alpha1
kind: GrafanaDataSource
metadata:
  name: cluster-victoriametrics
spec:
  name: victoriametrics.yaml
  datasources:
    - name: Prometheus
      type: prometheus
      access: proxy
      url: http://vmselect.vmselect:8481/select/0/prometheus/
      isDefault: true
      version: 1
      editable: false
      jsonData:
        tlsSkipVerify: true
        timeInterval: "30s"

hagen · March 30, 2022, 7:24pm

I believe, timeInterval stands for a different thing:

Lowest interval/step value that should be used for this data source.

You probably looking for timeout setting:

Request timeout in seconds. Overrides dataproxy.timeout option

See more details here Provision Grafana | Grafana documentation

pcperera · April 1, 2022, 5:27am

@hagen

I don’t see the timeout field in GrafanaDataSource CRD in grafana-operator/api.md at master · grafana-operator/grafana-operator · GitHub.

Am I missing anything here ?

Thanks
Priyanka

Topic		Replies	Views
Request canceled (Client.Timeout exceeded while awaiting headers) Configuration	11	4142	June 20, 2025
Can only display dashboards with time range of 1hr or less. Any range above 1hr fails Dashboards query-help	5	2630	November 5, 2023
Grafana timing out when querying Prometheus datasource Prometheus	1	9095	August 5, 2021
Grafana + Prometheus Configuration	3	6052	May 1, 2024
Failed to call resource - v9.1.0 (82e32447b4) Dashboards	1	1988	September 13, 2022

Consistent "broken pipe" error from vmselect when executing PromQL with a longer time range from Grafana.

Related topics