Loki recording rule range query

Hi,

I am playing around with the ruler to create a recording rule. Somehow I am stuck now.

This is my rule:

groups:
  - name: CloudfrontBytesPerHost
    interval: 1m
    rules:
      - record: aws:cloufront:bytes:per:host
        expr: |
          sum by (
            x_host_header,x_edge_location,sc_status
          ) (
            sum_over_time(
              {job="logstash"} | pattern `<_>	<_>	<x_edge_location>	<sc_bytes>	<c_ip>	<cs_method>	<_>	<cs_uri_stem>	<sc_status>	<cs_Referer>	<cs_User_Agent>	<cs_uri_query>	<cs_Cookie>	<x_edge_result_type>	<x_edge_request_id>	<_>	<cs_protocol>	<cs_bytes>	<time_taken>	<x_forwarded_for>	<ssl_protocol>	<ssl_cipher>	<x_edge_response_result_type>	<cs_protocol_version>	<fle_status>	<fle_encrypted_fields>	<c_port>	<time_to_first_byte>	<x_edge_detailed_result_type>	<sc_content_type>	<sc_content_len>	<sc_range_start>	<sc_range_end>` | unwrap sc_bytes [1m]
            )
          )

As you can see I am processing cloudfront logs. I pull them from an s3 bucket with logstash. Due to the nature of aws log and logstash these logs can be delays up to 5 minutes. I am now struggling with the query the ruler sends. It looks like it is an instant query. I end up with large gaps in my metrics.

How long is the instant range?
Is there a way to let the ruler query with a range-query?

Regards,

Hi @vquie

Is there a way to let the ruler query with a range-query?

There is not. The ruler has to perform an instant query because it is producing a metric to send to a Prometheus-compatible source, and it expects a single timestamp per metric series.

Due to the nature of aws log and logstash these logs can be delays up to 5 minutes.

There are a couple different ways to solve this.

Option 1: You could increase your group interval to 5m and set your query range to be 5m as well (in your sum_over_time aggregation). This will run your query every 5m and look back for a 5m window.

Option 2: Keep your query and interval as it is but configure the following limits_config:

ruler_evaluation_delay_duration

This setting will adjust the “instant” at which the query is evaluated to be further back in the past. You could set this to 5m, and all queries will be shifted back in time by 5m.

This option won’t give you a “true” result though because of the time adjustment, so a spike in logs at 12:00 would look like it’s happening at 12:05.

Hope that helps

Thank you very much.
I update the interval to 15s in the rule and the sum_over_time aggregation. Also I set the ruler_evaluation_delay_duration to 10min. The graphs look good so far.

This topic was automatically closed 365 days after the last reply. New replies are no longer allowed.