Grafana agent health check

Grafana OSS v10.1, Loki, Prometheus has been installed and running properly.
Basic necessary settings (such as datasources, dashboard, alert rules etc) has been complete and working normally.
There are over 1000+ servers (Windows / Linux) installed Grafana agent and reporting metrics/logs to prometheus and Loki
But I hit issues on monitor Grafana agent healthiness.
When Grafana agent stopped / crash unexpectedly due to what ever reason, seems cannot monitor Grafana agent healthiness.
May I know how to monitor agent healthiness in proper ways?

Blockquote

integrations:
node_exporter:
enabled: true
scrape_interval: 60s
scrape_timeout: 30s
disable_collectors:
- ipvs
- btrfs
- infiniband
netclass_ignored_devices: “^(veth.|cali.|[a-f0-9]{15})$”
netdev_device_exclude: “^(veth.|cali.|[a-f0-9]{15})$”
filesystem_fs_types_exclude: “^(autofs|binfmt_misc|bpf|cgroup2?|configfs|debugfs|devpts|devtmpfs|tmpfs|fusectl|hugetlbfs|iso9660|mqueue|nsfs|overlay|proc|procfs|pstore|rpc_pipefs|securityfs|selinuxfs|squashfs|sysfs|tracefs)$”
metric_relabel_configs:
- action: drop
regex: node_scrape_collector_.+
source_labels: [name]
relabel_configs:
- replacement: tbchostname.rhidm.net
target_label: instance
prometheus_remote_write:

  • url: https://kenntun-monitor.asuscomm.com:9090/api/v1/write
    remote_timeout: 30s
    basic_auth:
    username: padmin
    password: XXX
    tls_config:
    insecure_skip_verify: true
    queue_config:
    batch_send_deadline: 60s
    agent:
    enabled: true
    relabel_configs:
    • action: replace
      source_labels:
      • agent_hostname
        target_label: instance
    • action: replace
      target_label: job
      replacement: “integrations/agent”
      metric_relabel_configs:
    • action: keep
      regex: (prometheus_target_.|prometheus_sd_discovered_targets|agent_build.|agent_wal_samples_appended_total|process_start_time_seconds)
      source_labels:
      • name

Blockquote