Query error: upstream request timeout

Hello,

Im having the following timeout when a do a Query that makes a big search on the logs:
image
It makes the timeout always at 15s:

It is true that if a make the query with the labels, it does not timeout because labels are much more efficient on the query. But, I would like that, even the query is not made with labels, it would not timeout even it takes so long to make the query.

FYI: I increased ALL the timeouts of the config file already:

auth_enabled: false

server:
  http_listen_port: 3100
  grpc_server_max_recv_msg_size: 12667131    #<int> | default = 4194304
  grpc_server_max_send_msg_size: 12667131    #<int> | default = 4194304
  http_server_read_timeout: 10m
  http_server_write_timeout: 10m
  grpc_server_min_time_between_pings: 40s

querier:
  query_timeout: 5m0s
  tail_max_duration: 1h0m0s
  engine:
    timeout: 5m0s
    max_look_back_period: 30s

distributor:
  ring:
    kvstore:
      store: inmemory
      etcd:
        dial_timeout: 40s
      consul:
        http_client_timeout: 40s

ingester:
  lifecycler:
    address: 127.0.0.1
    ring:
      kvstore:
        store: inmemory
        consul:
          http_client_timeout: 180s
        etcd:
          dial_timeout: 180s
      replication_factor: 1
    final_sleep: 0s
  chunk_idle_period: 1h       # Any chunk not receiving new logs in this time will be flushed
  max_chunk_age: 1h           # All chunks will be flushed when they hit this age, default is 1h
  chunk_target_size: 8572864  # Loki will attempt to build chunks up to 1.5MB, flushing first if chunk_idle_period or max_chunk_age is reached first
  chunk_retain_period: 30s    # Must be greater than index read cache TTL if using an index cache (Default index read cache TTL is 5m)
  max_transfer_retries: 0     # Chunk transfers disabled
  concurrent_flushes: 64
  flush_op_timeout: 10m
  wal:
    dir: "/tmp/wal"

limits_config:
  ingestion_rate_strategy: global
  ingestion_rate_mb: 5
  ingestion_burst_size_mb: 1
  enforce_metric_name: false
  max_global_streams_per_user: 5000
  per_stream_rate_limit: 3MB
  per_stream_rate_limit_burst: 15MB
  max_query_length: 31d
  cardinality_limit: 10000
  max_cache_freshness_per_query: 10m
  split_queries_by_interval: 30m
  ruler_remote_write_url: ""
  ruler_remote_write_timeout: 40s
  ruler_remote_write_headers: {}
  ruler_remote_write_queue_capacity: 0
  ruler_remote_write_queue_min_shards: 0
  ruler_remote_write_queue_max_shards: 0
  ruler_remote_write_queue_max_samples_per_send: 0
  ruler_remote_write_queue_batch_send_deadline: 40s
  ruler_remote_write_queue_min_backoff: 0s
  ruler_remote_write_queue_max_backoff: 0s
  ruler_remote_write_queue_retry_on_ratelimit: false
  retention_period: 35d
  per_tenant_override_period: 40s


schema_config:
  configs:
    - from: 2021-05-26
      store: boltdb-shipper
      object_store: s3
      schema: v11
      index:
        prefix: index_
        period: 24h

storage_config:
  boltdb_shipper:
    active_index_directory: /loki/boltdb-shipper-active
    cache_location: /loki/boltdb-shipper-cache
    cache_ttl: 24h         # Can be increased for faster performance over longer query periods, uses more disk space
    shared_store: s3
  aws:
    bucketnames: bucket-name
    endpoint: s3.{AWS_REGION}.amazonaws.com
    region: {AWS_REGION}
    insecure: false
    sse_encryption: false
    http_config:
      idle_conn_timeout: 90s
      response_header_timeout: 0s
      insecure_skip_verify: false
    backoff_config:
      max_period: 40s
    s3forcepathstyle: true

compactor:
  working_directory: /loki/boltdb-shipper-compactor
  shared_store: s3

chunk_store_config:
  max_look_back_period: 0s

table_manager:
  retention_deletes_enabled: false
  retention_period: 0s

ruler:
  storage:
    type: local
    local:
      directory: /etc/loki/rules
  rule_path: /tmp/scratch
  alertmanager_url: http://alert-manager.control.private:9093
  ring:
    kvstore:
      store: inmemory
      etcd:
        dial_timeout: 180s
      consul:
        http_client_timeout: 180s
  enable_api: true
  enable_alertmanager_v2: true```

Did you check the timeout configuration for your loki data source in grafana?

Yes, it is set on 90s

Hi @unaigil ,
I’m facing the same problem.
Long running loki queries are timeouted after 15s in grafana.

Have you found a solution for your problem?

In our case we are using contour as ingress controller and envoy has the default response timeout of 15s. I increased it to 2m by adding the ingress annotation and it resolved the issue.

projectcontour.io/response-timeout: 2m

we got the same issues. who is the upstream? the loki read cluster?