Consistent "broken pipe" error from vmselect when executing PromQL with a longer time range from Grafana.

Issue
“Broken pipe” error is thrown when executing PromQL (from Grafana) with a longer time range. This broken pipe occurs between Grafana and vmselect. This usually occurs between 3-5s. Grafana appears to terminate the network connection while vmselect attempts to write results to the channel.

To Reproduce
Invoke below from Grafana.

sum by (kubernetes_cluster,envoy_cluster_name) (increase(envoy_cluster_upstream_rq{envoy_response_code=~"(5..|429)"}[100d])) > 10.

Expected behavior
PromQL returns with results.

Actual behavior
PromQL times out between 3-5s (ref below screenshot).

Logs

vmselect

severity: "ERROR" textPayload: "2022-03-16T13:36:10.446Z warn VictoriaMetrics/app/vmselect/main.go:563 error in "/select/0/prometheus/api/v1/query_range?end=1647437400&query=envoy_cluster_upstream_rq%5B7776000s%5D&start=1639661400&step=3600": error when executing query="envoy_cluster_upstream_rq[7776000s]" on the time range (start=1639661400000, end=1647437400000, step=3600000): cannot send query range response to remote client: cannot send 2 bytes to client: write tcp4 10.0.0.24:8481->10.0.7.148:38124: write: broken pipe" timestamp: "2022-03-16T13:36:10.446458671Z"

Screenshots
image

Version
vmselect:v1.72.0
Grafana version v8.3.3 (30bb7a93ca)

Parameters

  • search.maxQueryDuration = 30s [VM]
  • grafana.client.timeout = 20s [Grafana Helm]
  • GrafanaDataSource jsonData:timeInterval = 25s (below CRD).

apiVersion: integreatly.org/v1alpha1 kind: GrafanaDataSource metadata: name: cluster-victoriametrics spec: name: victoriametrics.yaml datasources: - name: Prometheus type: prometheus access: proxy url: http://namespace.cluster:<port>/select/0/prometheus/ isDefault: true version: 1 editable: false jsonData: tlsSkipVerify: true timeInterval: "25s"

Question
I expected GrafanaDataSource jsonData:timeInterval of 25s (above) to be relevant for the vmselect invocation. But it seems Grafana times out within 3-5s. What is the correct timeout to set on the Grafana side for VictoriaMetrics PromQL queries. ?

Appreciate any help on this.

Hi @pcperera! Do you see the same error when using curl or vmui for executing this query?

1 Like

Hi @hagen

Thanks for the response.

No, vmui query is successful repeatedly.

@hagen and all,

Doesn’t the timeInterval govern the data source timeout ? Grafana query times out in 3-5s.

apiVersion: integreatly.org/v1alpha1
kind: GrafanaDataSource
metadata:
  name: cluster-victoriametrics
spec:
  name: victoriametrics.yaml
  datasources:
    - name: Prometheus
      type: prometheus
      access: proxy
      url: http://vmselect.vmselect:8481/select/0/prometheus/
      isDefault: true
      version: 1
      editable: false
      jsonData:
        tlsSkipVerify: true
        timeInterval: "30s"

I believe, timeInterval stands for a different thing:

Lowest interval/step value that should be used for this data source.

You probably looking for timeout setting:

Request timeout in seconds. Overrides dataproxy.timeout option

See more details here Provision Grafana | Grafana documentation

@hagen

I don’t see the timeout field in GrafanaDataSource CRD in grafana-operator/api.md at master · grafana-operator/grafana-operator · GitHub.

Am I missing anything here ?

Thanks
Priyanka