Error forwarding metrics with Alloy to Mimir

I am having issues forwarding metrics to Mimir using Alloy. My setup is as follows:

  • RKE2 v1.31
  • Grafana LGTM 2.1.0
  • Grafana Alloy 0.9.2
  • kube-state-metrics
  • cilium v1.16.2
    • kube-proxy replacement
    • ingress controller
    • l2 announcement
    • gateway API
    • clustermesh (type LoadBalancer)
  • OpenEBS
  • prometheus CRDs

For the LGTM stack a few minor modificiations were made such as changing the dnsService for rke2. In Alloy I enabled varlog and dockercontainers mounts for collecting logs (loki is working fine), other than that I did not make any changes. As a new user I can not upload files, so here is the gist of my alloy config.

These are the errors I am getting, not sure what to make of these:

ts=2024-10-26T17:08:07.813508529Z level=debug msg="Scrape failed" component_path=/ component_id=prometheus.scrape.services scrape_pool=prometheus.scrape.services target=http://lgtm-distributed-tempo-query-frontend.monitoring.svc:9095/metrics err="Get \"http://lgtm-distributed-tempo-query-frontend.monitoring.svc:9095/metrics\": net/http: HTTP/1.x transport connection broken: malformed HTTP response \"\\x00\\x00\\f\\x04\\x00\\x00\\x00\\x00\\x00\\x00\\x05\\x00\\x00@\\x00\\x00\\x03\\x00\\x00\\x00d\""
ts=2024-10-26T17:08:07.841161424Z level=debug msg="Scrape failed" component_path=/ component_id=prometheus.scrape.pods scrape_pool=prometheus.scrape.pods target=http://192.168.1.120:80/metrics err="Get \"http://192.168.1.120:80/metrics\": dial tcp 192.168.1.120:80: connect: connection refused"
ts=2024-10-26T17:08:07.880070607Z level=debug msg="Scrape failed" component_path=/ component_id=prometheus.scrape.services scrape_pool=prometheus.scrape.services target=http://lgtm-distributed-mimir-compactor.monitoring.svc:9095/metrics err="Get \"http://lgtm-distributed-mimir-compactor.monitoring.svc:9095/metrics\": net/http: HTTP/1.x transport connection broken: malformed HTTP response \"\\x00\\x00\\f\\x04\\x00\\x00\\x00\\x00\\x00\\x00\\x05\\x00\\x00@\\x00\\x00\\x03\\x00\\x00\\x00d\""
ts=2024-10-26T17:08:08.187541573Z level=debug msg="Scrape failed" component_path=/ component_id=prometheus.scrape.services scrape_pool=prometheus.scrape.services target=http://lgtm-distributed-tempo-distributor.monitoring.svc:9095/metrics err="Get \"http://lgtm-distributed-tempo-distributor.monitoring.svc:9095/metrics\": net/http: HTTP/1.x transport connection broken: malformed HTTP response \"\\x00\\x00\\f\\x04\\x00\\x00\\x00\\x00\\x00\\x00\\x05\\x00\\x00@\\x00\\x00\\x03\\x00\\x00\\x00d\""
ts=2024-10-26T17:08:08.340802804Z level=debug msg="Scrape failed" component_path=/ component_id=prometheus.scrape.services scrape_pool=prometheus.scrape.services target=http://lgtm-distributed-loki-querier-headless.monitoring.svc:9095/metrics err="Get \"http://lgtm-distributed-loki-querier-headless.monitoring.svc:9095/metrics\": net/http: HTTP/1.x transport connection broken: malformed HTTP response \"\\x00\\x00\\f\\x04\\x00\\x00\\x00\\x00\\x00\\x00\\x05\\x00\\x00@\\x00\\x00\\x03\\x00\\x00\\x00d\""
ts=2024-10-26T17:08:08.363253285Z level=debug msg="Scrape failed" component_path=/ component_id=prometheus.scrape.pods scrape_pool=prometheus.scrape.pods target=http://10.0.0.144:80/metrics err="Get \"http://10.0.0.144:80/metrics\": dial tcp 10.0.0.144:80: connect: connection refused"
ts=2024-10-26T17:08:08.451460337Z level=debug msg="Scrape failed" component_path=/ component_id=prometheus.scrape.pods scrape_pool=prometheus.scrape.pods target=http://10.0.0.144:80/metrics err="Get \"http://10.0.0.144:80/metrics\": dial tcp 10.0.0.144:80: connect: connection refused"
ts=2024-10-26T17:08:08.699649589Z level=debug msg="Scrape failed" component_path=/ component_id=prometheus.scrape.pods scrape_pool=prometheus.scrape.pods target=http://192.168.1.121:80/metrics err="Get \"http://192.168.1.121:80/metrics\": dial tcp 192.168.1.121:80: connect: connection refused"

Is it writing to Loki at all? Permission issues with minor?

I do see Loki logs when I explore the Loki datasource in Grafana, and I am seeing a small number of chunks in Minio (the default LGTM helm chart sets the console keys to grafana-mimir/supersecret). On this installation I did not create separate keys, I did this in another installation where I am using Ceph/Rook.

the mimir distributor pod rejects every sample with err-mimir-sample-out-of-order error, however the nodes all have their time synced with crony and I added this to my values so not sure why this is happening…

  structuredConfig:
    limits:
      max_global_series_per_user: 20000000
      ingestion_rate: 200000
      ingestion_burst_size: 1000000
      out_of_order_time_window: 5m