Check_nt and check_nrpe returning different values in Grafana

neworderfac33 · September 12, 2018, 1:49pm

Good afternoon - I have set up a dashboard with graphs that monitor memory usage based on the following Nagios command and service definitions

define command{
        command_name    check_nt
        command_line    $USER1$/check_nt -H $HOSTADDRESS$ -p 12489 -v $ARG1$ $ARG2$
        }
# Monitor TOTAL memory usage with check_nt
define service{
        use                      generic-service
        #host_name          MyServer
        hostgroup_name MyServers           
       service_description      Win_TotMem_NT
       check_command            check_nt!MEMUSE!-w 90 -c 95
       }

However, the values returned seem higher than if I remote into the servers individually, so I tried to replicate the above functionality with check_nrpe using the following:

define command{
        command_name check_nrpe_totmem
        command_line $USER1$/check_nrpe -H $HOSTADDRESS$ -p 5666 -c CheckMEM -a MaxWarn=80% MaxCrit=90% ShowAll=long
}
define service{
        use                     generic-service
        #host_name       MyServer
        hostgroup_name MyServers
        service_description     Win_TotMem_NRPE
        check_command           check_nrpe_totmem
    }

Whilst the values returned in Nagios for the check_nt and check_nrpe services match each other, when I try to set up graphs for these in a Grafana dashboard, whilst the check_nt charts work fine, the check_nrpe ones don’t - they pretty much flatline. I have the Y Axis unit set to megabytes and whilst check_nt reports values in the range 2.0-3.0 GB, check_nrpe shows minimal MB values.
I’m assuming that this is something to do with the way in which the check_nrpe data is parsing, but I have no idea as to where to start in resolving this, so any advice would be gratefully received!
Thanks in advance
Pete

neworderfac33 · September 13, 2018, 3:33pm

Update: - what I have discovered since yesterday is that data from the check_nt service is picked up by Grafana as a GB value, but data from the check_nrpe service is picked up as a % value, so, once I changed the Y axis of the check_nrpe graph to plot % (0-100%), the chart pairs matched up - kind of.

Each server has 4GB of physical memory and 6GB of virtual memory - 10GB in all. So, you would reasonably expect the GB chart and the % charts to look identical - they do, up to a point - whilst they have the same peaks and troughs at the same time, you would (for example) expect 3GB usage on one chart to reflect 30% usage on the other - but the values on the % chart are lower (% wise) than you might expect.

AND, both plugins return the same values in the Nagios UI and from the CLI!

 /usr/local/nagios/libexec/check_nt -H 99.99.99.99 -p 12489 -v MEMUSE
Memory usage: total:10239.64 MB - used: 2568.19 MB (25%) - free: 7671.46 MB (75%) | 'Memory usage'=2568.19MB;0.00;0.00;0.00;10239.64

/usr/local/nagios/libexec/check_nrpe -H 99.99.99.99 -p5666 -c CheckMEM -a MaxWarn=80% MaxCrit=90% ShowAll=long
OK: committed: Total: 10GB - Used: 2.508GB (25%) - Free: 7.492GB (74%), physical: Total: 4GB - Used: 1.202GB (30%) - Free: 2.798GB (69%)|'committed'=2.50799GB;7.99965;8.9996;0;9.99956 'committed %'=25%;79;89;0;100 'physical'=1.20167GB;3.19965;3.5996;0;3.99956 'physical %'=30%;79;89;0;100

So now I have something ELSE to look at!
Pete

neworderfac33 · October 8, 2018, 9:20am

I don’t suppose anyone has had any thoughts on this, have they? To recap, measuring memory usage using check_nt and check_nrpe returns the same results in Nagios, but Grafana graphs return different values (though with peaks and troughs at the same points) The issue has been outstanding for a month now and I’m coming under some pressure from The Powers That Be to get it resolved. Thanks Pete