Hello, I have set up an agent using the “Add new connection” tab in GC. I have it setup with integration for Linux vitals and nodejs vitals. Everything works, I receive all metrics.
I then added a new metric with the nodejs prometheus package to monitor the http request times. The /metrics endpoint correctly shows all nodejs metrics and also show the http_request_duration_seconds
metric.
For some reason, I do not see it in the Metrics Explorer on GC. I have added the metric name in the agent config, not too sure what i’m doing in there yet but I guessed if the others are here I should certainly add that one too… And restarted the agent a couple times.
Still no. Here are the relevant technical details:
Config file
integrations:
node_exporter:
enabled: true
# disable unused collectors
disable_collectors:
- ipvs #high cardinality on kubelet
- btrfs
- infiniband
- xfs
- zfs
# exclude dynamic interfaces
netclass_ignored_devices: "^(veth.*|cali.*|[a-f0-9]{15})$"
netdev_device_exclude: "^(veth.*|cali.*|[a-f0-9]{15})$"
# disable tmpfs
filesystem_fs_types_exclude: "^(autofs|binfmt_misc|bpf|cgroup2?|configfs|debugfs|devpts|devtmpfs|tmpfs|fusectl|hugetlbfs|iso9660|mqueue|nsfs|overlay|proc|procfs|pstore|rpc_pipefs|securityfs|selinuxfs|squashfs|sysfs|tracefs)$"
# drop extensive scrape statistics
relabel_configs:
- replacement: 'x'
target_label: instance
metric_relabel_configs:
- action: drop
regex: node_scrape_collector_.+
source_labels:
- __name__
- action: keep
regex: node_arp_entries|node_boot_time_seconds|node_context_switches_total|node_cpu_seconds_total|node_disk_io_time_seconds_total|node_disk_io_time_weighted_seconds_total|node_disk_read_bytes_total|node_disk_read_time_seconds_total|node_disk_reads_completed_total|node_disk_write_time_seconds_total|node_disk_writes_completed_total|node_disk_written_bytes_total|node_filefd_allocated|node_filefd_maximum|node_filesystem_avail_bytes|node_filesystem_device_error|node_filesystem_files|node_filesystem_files_free|node_filesystem_readonly|node_filesystem_size_bytes|node_intr_total|node_load1|node_load15|node_load5|node_md_disks|node_md_disks_required|node_memory_Active_anon_bytes|node_memory_Active_bytes|node_memory_Active_file_bytes|node_memory_AnonHugePages_bytes|node_memory_AnonPages_bytes|node_memory_Bounce_bytes|node_memory_Buffers_bytes|node_memory_Cached_bytes|node_memory_CommitLimit_bytes|node_memory_Committed_AS_bytes|node_memory_DirectMap1G_bytes|node_memory_DirectMap2M_bytes|node_memory_DirectMap4k_bytes|node_memory_Dirty_bytes|node_memory_HugePages_Free|node_memory_HugePages_Rsvd|node_memory_HugePages_Surp|node_memory_HugePages_Total|node_memory_Hugepagesize_bytes|node_memory_Inactive_anon_bytes|node_memory_Inactive_bytes|node_memory_Inactive_file_bytes|node_memory_Mapped_bytes|node_memory_MemAvailable_bytes|node_memory_MemFree_bytes|node_memory_MemTotal_bytes|node_memory_SReclaimable_bytes|node_memory_SUnreclaim_bytes|node_memory_ShmemHugePages_bytes|node_memory_ShmemPmdMapped_bytes|node_memory_Shmem_bytes|node_memory_Slab_bytes|node_memory_SwapTotal_bytes|node_memory_VmallocChunk_bytes|node_memory_VmallocTotal_bytes|node_memory_VmallocUsed_bytes|node_memory_WritebackTmp_bytes|node_memory_Writeback_bytes|node_netstat_Icmp6_InErrors|node_netstat_Icmp6_InMsgs|node_netstat_Icmp6_OutMsgs|node_netstat_Icmp_InErrors|node_netstat_Icmp_InMsgs|node_netstat_Icmp_OutMsgs|node_netstat_IpExt_InOctets|node_netstat_IpExt_OutOctets|node_netstat_TcpExt_ListenDrops|node_netstat_TcpExt_ListenOverflows|node_netstat_TcpExt_TCPSynRetrans|node_netstat_Tcp_InErrs|node_netstat_Tcp_InSegs|node_netstat_Tcp_OutRsts|node_netstat_Tcp_OutSegs|node_netstat_Tcp_RetransSegs|node_netstat_Udp6_InDatagrams|node_netstat_Udp6_InErrors|node_netstat_Udp6_NoPorts|node_netstat_Udp6_OutDatagrams|node_netstat_Udp6_RcvbufErrors|node_netstat_Udp6_SndbufErrors|node_netstat_UdpLite_InErrors|node_netstat_Udp_InDatagrams|node_netstat_Udp_InErrors|node_netstat_Udp_NoPorts|node_netstat_Udp_OutDatagrams|node_netstat_Udp_RcvbufErrors|node_netstat_Udp_SndbufErrors|node_network_carrier|node_network_info|node_network_mtu_bytes|node_network_receive_bytes_total|node_network_receive_compressed_total|node_network_receive_drop_total|node_network_receive_errs_total|node_network_receive_fifo_total|node_network_receive_multicast_total|node_network_receive_packets_total|node_network_speed_bytes|node_network_transmit_bytes_total|node_network_transmit_compressed_total|node_network_transmit_drop_total|node_network_transmit_errs_total|node_network_transmit_fifo_total|node_network_transmit_multicast_total|node_network_transmit_packets_total|node_network_transmit_queue_length|node_network_up|node_nf_conntrack_entries|node_nf_conntrack_entries_limit|node_os_info|node_sockstat_FRAG6_inuse|node_sockstat_FRAG_inuse|node_sockstat_RAW6_inuse|node_sockstat_RAW_inuse|node_sockstat_TCP6_inuse|node_sockstat_TCP_alloc|node_sockstat_TCP_inuse|node_sockstat_TCP_mem|node_sockstat_TCP_mem_bytes|node_sockstat_TCP_orphan|node_sockstat_TCP_tw|node_sockstat_UDP6_inuse|node_sockstat_UDPLITE6_inuse|node_sockstat_UDPLITE_inuse|node_sockstat_UDP_inuse|node_sockstat_UDP_mem|node_sockstat_UDP_mem_bytes|node_sockstat_sockets_used|node_softnet_dropped_total|node_softnet_processed_total|node_softnet_times_squeezed_total|node_systemd_unit_state|node_textfile_scrape_error|node_time_zone_offset_seconds|node_timex_estimated_error_seconds|node_timex_maxerror_seconds|node_timex_offset_seconds|node_timex_sync_status|node_uname_info|node_vmstat_oom_kill|node_vmstat_pgfault|node_vmstat_pgmajfault|node_vmstat_pgpgin|node_vmstat_pgpgout|node_vmstat_pswpin|node_vmstat_pswpout|process_max_fds|process_open_fds
source_labels:
- __name__
prometheus_remote_write:
- basic_auth:
password: x
username: x
url: https://x.grafana.net/api/prom/push
agent:
enabled: true
relabel_configs:
- action: replace
source_labels:
- agent_hostname
target_label: instance
- action: replace
target_label: job
replacement: "integrations/agent-check"
metric_relabel_configs:
- action: keep
regex: (prometheus_target_sync_length_seconds_sum|prometheus_target_scrapes_.*|prometheus_target_interval.*|prometheus_sd_discovered_targets|agent_build.*|agent_wal_samples_appended_total|process_start_time_seconds)
source_labels:
- __name__
# Add here any snippet that belongs to the `integrations` section.
# For a correct indentation, paste snippets copied from Grafana Cloud at the beginning of the line.
logs:
configs:
- clients:
- basic_auth:
password: x
username: x
url: https://x.grafana.net/loki/api/v1/push
name: integrations
positions:
filename: /tmp/positions.yaml
scrape_configs:
# Add here any snippet that belongs to the `logs.configs.scrape_configs` section.
# For a correct indentation, paste snippets copied from Grafana Cloud at the beginning of the line.
metrics:
configs:
- name: integrations
remote_write:
- basic_auth:
password: x
username: x
url: https://x.grafana.net/api/prom/push
scrape_configs:
- job_name: integrations/nodejs
static_configs:
- targets: ['localhost:4001']
relabel_configs:
- replacement: 'x'
target_label: instance
metric_relabel_configs:
- action: keep
regex: nodejs_active_handles_total|nodejs_active_requests_total|nodejs_eventloop_lag_p50_seconds|nodejs_eventloop_lag_p99_seconds|nodejs_eventloop_lag_seconds|nodejs_external_memory_bytes|nodejs_gc_duration_seconds_count|nodejs_gc_duration_seconds_sum|nodejs_heap_size_total_bytes|nodejs_heap_size_used_bytes|nodejs_heap_space_size_used_bytes|nodejs_version_info|process_cpu_seconds_total|process_cpu_system_seconds_total|process_cpu_user_seconds_total|process_resident_memory_bytes|process_start_time_seconds|http_request_duration_seconds
source_labels:
- __name__
# Add here any snippet that belongs to the `metrics.configs.scrape_configs` section.
# For a correct indentation, paste snippets copied from Grafana Cloud at the beginning of the line.
global:
scrape_interval: 60s
wal_directory: /tmp/grafana-agent-wal
And the /metrics that provide NodeJS vitals and my new http_request_duration_seconds metric
# HELP nodejs_gc_duration_seconds Garbage collection duration by kind, one of major, minor, incremental or weakcb.
# TYPE nodejs_gc_duration_seconds histogram
nodejs_gc_duration_seconds_bucket{le="0.001",kind="minor"} 40
nodejs_gc_duration_seconds_bucket{le="0.01",kind="minor"} 3108
nodejs_gc_duration_seconds_bucket{le="0.1",kind="minor"} 3366
nodejs_gc_duration_seconds_bucket{le="1",kind="minor"} 3369
nodejs_gc_duration_seconds_bucket{le="2",kind="minor"} 3369
nodejs_gc_duration_seconds_bucket{le="5",kind="minor"} 3369
nodejs_gc_duration_seconds_bucket{le="+Inf",kind="minor"} 3369
nodejs_gc_duration_seconds_sum{kind="minor"} 15.322641205787649
nodejs_gc_duration_seconds_count{kind="minor"} 3369
nodejs_gc_duration_seconds_bucket{le="0.001",kind="incremental"} 117
nodejs_gc_duration_seconds_bucket{le="0.01",kind="incremental"} 168
nodejs_gc_duration_seconds_bucket{le="0.1",kind="incremental"} 168
nodejs_gc_duration_seconds_bucket{le="1",kind="incremental"} 168
nodejs_gc_duration_seconds_bucket{le="2",kind="incremental"} 168
nodejs_gc_duration_seconds_bucket{le="5",kind="incremental"} 168
nodejs_gc_duration_seconds_bucket{le="+Inf",kind="incremental"} 168
nodejs_gc_duration_seconds_sum{kind="incremental"} 0.1601880836486817
nodejs_gc_duration_seconds_count{kind="incremental"} 168
nodejs_gc_duration_seconds_bucket{le="0.001",kind="major"} 0
nodejs_gc_duration_seconds_bucket{le="0.01",kind="major"} 0
nodejs_gc_duration_seconds_bucket{le="0.1",kind="major"} 81
nodejs_gc_duration_seconds_bucket{le="1",kind="major"} 81
nodejs_gc_duration_seconds_bucket{le="2",kind="major"} 81
nodejs_gc_duration_seconds_bucket{le="5",kind="major"} 81
nodejs_gc_duration_seconds_bucket{le="+Inf",kind="major"} 81
nodejs_gc_duration_seconds_sum{kind="major"} 2.6652107810974117
nodejs_gc_duration_seconds_count{kind="major"} 81
nodejs_gc_duration_seconds_bucket{le="0.001",kind="weakcb"} 823
nodejs_gc_duration_seconds_bucket{le="0.01",kind="weakcb"} 828
nodejs_gc_duration_seconds_bucket{le="0.1",kind="weakcb"} 1151
nodejs_gc_duration_seconds_bucket{le="1",kind="weakcb"} 1154
nodejs_gc_duration_seconds_bucket{le="2",kind="weakcb"} 1154
nodejs_gc_duration_seconds_bucket{le="5",kind="weakcb"} 1154
nodejs_gc_duration_seconds_bucket{le="+Inf",kind="weakcb"} 1154
nodejs_gc_duration_seconds_sum{kind="weakcb"} 17.205023178100603
nodejs_gc_duration_seconds_count{kind="weakcb"} 1154
# HELP http_request_duration_seconds duration histogram of http responses labeled with: status_code, method, path
# TYPE http_request_duration_seconds histogram
http_request_duration_seconds_bucket{le="0.003",status_code="204",method="POST",path="/x"} 0
http_request_duration_seconds_bucket{le="0.03",status_code="204",method="POST",path="/x"} 0
http_request_duration_seconds_bucket{le="0.1",status_code="204",method="POST",path="/x"} 1
http_request_duration_seconds_bucket{le="0.3",status_code="204",method="POST",path="/x"} 1
http_request_duration_seconds_bucket{le="1.5",status_code="204",method="POST",path="/x"} 1
http_request_duration_seconds_bucket{le="10",status_code="204",method="POST",path="/x"} 1
http_request_duration_seconds_bucket{le="+Inf",status_code="204",method="POST",path="/x"} 1
http_request_duration_seconds_sum{status_code="204",method="POST",path="/x"} 0.075442879
http_request_duration_seconds_count{status_code="204",method="POST",path="/x"} 1
The metrics browser shows the nodejs_gc_duration:
But http is not found
I’m pretty new to all this and am lost as I do not find anywhere about how to troubleshoot this kind of behaviour.
Thanks a lot in advance for your help.