Loki Ruler Remote Evaluation implementation help

medampudi1 · July 29, 2024, 12:54pm

hello team,
we are running a big enough cluster to handle about >1TB of uncompressed data of logs and huge number of metrics.
we are using
LIDs - 0002 Remote Rule Evaluation
to perform rule execution. But it seems to be not stable and we are getting the following error.

│ loki-loki-distributed-ruler-749445c967-5t89k ts=2024-07-28T08:02:14.470716326Z caller=spanlogger.go:86 component=ruler evaluation_mode=remote user=platformpii method=ruler.remoteEvaluation.Query level=warn query_hash=1032185798 query="count_over_time ({container_name=\"pii-vault\"} |= \"latency\"[1m])" instant=2024-07-28T08:02:09.236487818Z response_time=1.797359754s msg="failed to evaluate rule" err="rpc error: code = ResourceExhausted desc = grpc: received message larger than max (8393682 vs. 4194304)"

we changed the grpc settings for ruler_client, query_scheduler.grpc_client_config, ingester_client.grpc_client_config, index_gateway_client.grpc_client_config, frontend_worker.grpc_client_config, frontend.grpc_client_config, to move the default from 4MB to 104 MB. But we are still getting this error.
following is our code sample without the major aspects. and critical info.
Can you please help us stabilize the our Grafana stack.

 ruler:
      enable_sharding: true
      query_stats_enabled: true
      evaluation_interval: 10s
      evaluation:
        max_jitter: 2s
        mode: remote
        query_frontend:
          address: "dns:///loki-loki-distributed-query-frontend-headless:9095"

      ruler_client:
        grpc_compression: snappy
        max_recv_msg_size: 104857600
        max_send_msg_size: 104857600

      wal:
        dir: ruler-wal
      wal_cleaner:
        period: 1h
      storage:
        type: s3
        s3:
          bucketnames: redacted
          endpoint: null
          region: region
          access_key_id: Redacted
          secret_access_key: redacted
       
      ring:
        kvstore:
          store: memberlist
      rule_path: /tmp/loki/scratch
      alertmanager_url: http://mimir-alertmanager:8080
      external_url: redacted
      remote_write:
        enabled: true
        clients:
          mimir:
            url: http://mimir-nginx/api/v1/push

tonyswumac · July 30, 2024, 4:49pm

Did you change server grpc size as well?
Have you consider aggregating your metrics a bit as well? I suspect you are not trying to send 8MB worth of message through your ruler.

medampudi1 · August 14, 2024, 6:02pm

Yes Server GRPC sizes are
grpc_server_max_recv_msg_size: 1048576000
grpc_server_max_send_msg_size: 1048576000

Aggregated metrics as in this is the count over time function that is being used for the same value… are you telling me the query is wrong, it seems that the data being queried is 8MB but i am hitting the 4MB limit somewhere in the chain of remote recording rule implementation.

system · August 14, 2025, 6:03pm

This topic was automatically closed 365 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Always issues with querying Loki data via Grafana Grafana Loki loki	2	1338	February 11, 2025
Ruler evaluation producing nconsistent alerts and rc 500 on instant queries Grafana Loki alerting , loki , helm	3	63	October 11, 2025
Per stream rate limit exceeded (limit: 3MB/sec) while attempting to ingest for stream Grafana Loki performance	1	5228	February 22, 2023
Promtail's error logs with status 413 Grafana Loki loki	1	1830	September 27, 2022
Memory usage implose when doing regular requests Grafana Loki	1	1979	June 1, 2022

Loki Ruler Remote Evaluation implementation help

Related topics