Monitor Grafana Mimir dashboard in baremetal deployment

  • What Grafana version and what operating system are you using?
    grafana-8.0.2-1.x86_64.rpm
    CentOS Linux release 7.9.2009 (Core)

My architecture are Grafana Agent (scrape its own metrics) → Nginx (load balancer redirect to Mimir cluster) → Grafana Mimir Cluster (three node running in bare-metal deployment) → Grafana

  • What are you trying to achieve?
    I’m learning Grafana Mimir and trying to follow the website monitor-grafana-mimir part to create Grafana Mimir dashboard.

  • How are you trying to achieve it?
    According to grafana website, I collected metrics as [Collecting metrics and logs from Grafana Mimir | Grafana Mimir documentation] said, but the website only gives us the Kubernetes example config. So I created /etc/grafana-agent.yaml by my own as following:

server:
  log_level: warn

metrics:
  wal_directory: /tmp/wal
  global:
    remote_write:
      - url: http://load-balancer:9009/api/v1/push
        headers:
          X-Scope-OrgID: mimir-monitor

integrations:
  agent:
    enabled: true
  node_exporter:
    enabled: true
    include_exporter_metrics: true
    disable_collectors:
      - "mdadm"

And then, because I’m not using Kubernetes to setup Grafana Mimir, so I followed steps in [About Grafana Mimir dashboards and alerts requirements | Grafana Mimir documentation] to re-compile the dashboard json file.

  • What happened?
    After importing re-compiled dashboard to Grafana, the dashboard didn’t work as expected. When I try to run query to search Grafana Mimir metrics, it seemed that there’s no related metric field as I thought.

  • What did you expect to happen?
    I expected to see some data while importing those re-compiled dashboard. I’m not sure it’s Grafana Agent setting’s issue or re-compiled dashboard’s issue?

  • Can you copy/paste the configuration(s) that you are having problems with?
    As above.

  • Did you receive any errors in the Grafana UI or in related logs? If so, please tell us exactly what they were.
    I had some error while running Grafana Agent, but I’m sure Grafana Agent is definitely sending metrics to Grafana Mimir because there are tenant directory called ‘mimir-monitor’ in Mimir’s TSDB.

Jan 06 17:34:33 a73 grafana-agent[30208]: ts=2023-01-06T09:34:33.277745901Z caller=dedupe.go:112 agent=prometheus instance=6983b86a53ab512daa671edbef791e3c component=remote level=error remote_name=6983b8-637728 url=http://load-balancer:9009/api/v1/push msg="non-recoverable error" count=29 exemplarCount=0 err="server returned HTTP status 400 Bad Request: user=mimir-monitor: the sample has been rejected because another sample with a more recent timestamp has already been ingested and out-of-order samples are not allowed (err-mimir-sample-out-of-order). The affected sample has timestamp 2023-01-06T09:32:27.05Z and is from series {__name__=\"node_disk_io_now\", agent_hostname=\"a73\", device=\"sda\", instance=\"a73:9090\", job=\"integrations/node_exporter\"}"
Jan 06 17:35:00 a73 grafana-agent[30208]: ts=2023-01-06T09:35:00.879909538Z caller=dedupe.go:112 agent=prometheus instance=6983b86a53ab512daa671edbef791e3c component=remote level=warn remote_name=6983b8-637728 url=http://load-balancer:9009/api/v1/push msg="Failed to send batch, retrying" err="server returned HTTP status 500 Internal Server Error: rpc error: code = DeadlineExceeded desc = context deadline exceeded"
  • Did you follow any online instructions? If so, what is the URL?
    As above.