[Ruler] Unable to fetch metrics that exist


I’ve recently set up a Loki stack using Tanka, templated into k8s manifests which are then applied to a standard cluster namespace. I am using GCS as the cloud store and have no issues having components connecting to it.

Here’s a lovely screenshot of my working setup in Grafana:

I tried setting up the ruler, with the same query loaded as a rule. This rule was loaded into the ruler using the cortextool.

namespace: rules
    - name: grafana_bang
        - alert: bang
          expr: sum by (org_id) (rate({job="loki/ruler"}[1h]) > 0)
            severity: critical
            summary: This should fire
export CORTEX_TENANT_ID=fake
./cortextool rules sync --rule-dirs=./rules --backend=loki --address http://localhost:3100

INFO[0000] updating group                                group=grafana_bang namespace=rules

Sync Summary: 1 Groups Created, 0 Groups Updated, 0 Groups Deleted

The alert did not fire. To check that there wasn’t anything wrong with my alertmanager setup. I set a rule (1+1) which was guaranteed to fire. The ruler successfully triggered alertmanager.

Setting logging to debug on the querier, I found the following:

level=debug ts=2022-08-02T14:29:50.603160265Z caller=async_store.go:81 msg="got chunk ids from ingester" count=0
ts=2022-08-02T14:29:50.604383714Z caller=spanlogger.go:80 org_id=fake method=query.Exec level=debug Ingester.TotalReached=2 Ingester.TotalChunksMatched=0 Ingester.TotalBatches=0 Ingester.TotalLinesSent=0 Ingester.TotalChunksRef=0 Ingester.TotalChunksDownloaded=0 Ingester.ChunksDownloadTime=0s Ingester.HeadChunkBytes="0 B" Ingester.HeadChunkLines=0 Ingester.DecompressedBytes="0 B" Ingester.DecompressedLines=0 Ingester.CompressedBytes="0 B" Ingester.TotalDuplicates=0 Querier.TotalChunksRef=0 Querier.TotalChunksDownloaded=0 Querier.ChunksDownloadTime=0s Querier.HeadChunkBytes="0 B" Querier.HeadChunkLines=0 Querier.DecompressedBytes="0 B" Querier.DecompressedLines=0 Querier.CompressedBytes="0 B" Querier.TotalDuplicates=0
ts=2022-08-02T14:29:50.604439904Z caller=spanlogger.go:80 org_id=fake method=query.Exec level=debug Summary.BytesProcessedPerSecond="0 B" Summary.LinesProcessedPerSecond=0 Summary.TotalBytesProcessed="0 B" Summary.TotalLinesProcessed=0 Summary.ExecTime=2.830909ms Summary.QueueTime=0s
level=info ts=2022-08-02T14:29:50.604545064Z caller=metrics.go:133 component=ruler org_id=fake latency=fast query="sum by(org_id)((rate({job=\"loki/ruler\"}[1h]) > 0))" query_type=metric range_type=instant length=0s step=0s duration=2.830909ms status=200 limit=0 returned_lines=0 throughput=0B total_bytes=0B total_entries=0 queue_time=0s subqueries=1

If I am reading the above correctly, it seems that 2 ingesters (I have replicaset value of 2) were contacted, but returned no data for the query.

I tried fiddling with the querier settings block, by query_store_only to true. This caused the Ingester.TotalReached value to be 0, but still not data returned for the query.

I think its important to note that the Tanka code generated sets up an nginx proxy, which I connect the Grafana data source to. When using logcli, I rely on this proxy to execute my queries. Directly polling my queriers returns no results. There are no obvious reasons as to why this happens.

When setting loglevel for my ingesters to debug, there is an acknowledgement that a gRPC request from the ruler was handled, but no data appears to be returned.

Any suggestions as to whether I’ve gotten my configuration wrong would be appreciated.

I reimplemented my stack using the loki-distributed helm chart.

The key difference was that the ring was implemented using “memberlist” instead of consul.

The alerts are firing now.