Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

No remote servers RAID status: check_raid UNKNOWN #194

Open
HHawk opened this issue May 14, 2019 · 16 comments
Open

No remote servers RAID status: check_raid UNKNOWN #194

HHawk opened this issue May 14, 2019 · 16 comments
Labels

Comments

@HHawk
Copy link

HHawk commented May 14, 2019

Hi glensc!

Sorry if my question is really stupid / silly. I am new to Nagios and I have to set this up again from scratch as our in house technician (and close friend) recently passed away. I managed to get most things working, however I have some issues with your script.

I had some issues about missing certail Perl modules, but eventually managed to get those installed and working.

If I run the command locally on the monitored server, your script is working perfectly.
For example:

[Remote.Server]# sh /usr/lib64/nagios/plugins/check_raid.sh
OK: megacli:[Volumes(1): DISK0.0:Optimal; Devices(2): 01,02=Online]

However on the Nagios Core (4.4.3) server I don't get any output:
check_raid UNKNOWN - No active plugins (No RAID found)

My guess is that it's looking on the Nagios Core server locally. Which does not have a RAID config, hence the "No RAID found" message obviously.

I have looked at your check_raid.cfg example and applied the following (on the Nagios Core server):

/usr/local/nagios/etc/objects/commands.cfg

define command {
        command_name    check_raid
        command_line    $USER1$/check_raid $ARG1$
        #command_line   $USER1$/check_raid $ARG1$ $HOSTADDRESS$
        #command_line   $USER1$/check_raid -H $HOSTADDRESS$ $ARG1$
}

/usr/local/nagios/etc/objects/vz_nodes.cfg

define service{
        use                             generic-service         ; Name of service template to use
        hostgroup_name                  vz_nodes
        service_description             Check RAID
        check_command                   check_raid
        #check_command                  check_nrpe!check_raid -t 60
        #check_command                  check_nrpe!check_raid
        normal_check_interval           120
        retry_check_interval            5
        notification_interval           3600
}

/usr/local/nagios/etc/nrpe.cfg

command[check_raid]=/usr/local/nagios/libexec/check_raid
#command[check_raid]=/usr/local/nagios/libexec/check_raid $HOSTADDRESS$

As you can see, I did actually try a lot of things, hence the hashtags. But none of it works.

Now I really have no clue what I am doing wrong. Maybe I did (probably) something wrong. But I have no clue where!

As mentioned above, running the command locally on the server which needs to be checked works without issues. But the issue is, is that it's not working correctly from the Nagios Core server.

Your plugin is located on the remote server in the folder: /usr/lib64/nagios/plugins/
And on the Nagios Core Server it's located in folder: /usr/local/nagios/libexec/

Sidenote; since I was missing quite a few Perl modules, I used the method you described in a very old reply here. I used the following:

yum install perl-core -y
curl -L https://cpanmin.us | perl - App::cpanminus
/usr/local/bin/cpanm Monitoring/Plugin.pm
/usr/local/bin/cpanm ExtUtils/MakeMaker/CPANfile.pm
/usr/local/bin/cpanm Module::Pluggable

Is this all really needed? Isn't there a better way?

Thank you kindly in advance! I hope you can shed your light on this. Probably it's something dumb/simple. Sorry.

Regards,
HHawk

@HHawk HHawk changed the title Unfortunately cannot get remote servers RAID status: No remote servers RAID status: check_raid UNKNOWN May 14, 2019
@glensc
Copy link
Owner

glensc commented May 14, 2019

sorry for your loss!

regarding installing and modules: you should be using released versions:

these have all dependencies embedded in check_raid.pl file.

this way of installing is described in readme:

installing dependencies for development mode is indeed, not really documented:

if your nagios server does not execute properly, you likely have not setup sudo rules, also documented in readme:

@glensc
Copy link
Owner

glensc commented May 14, 2019

regarding remote execution, does the nrpe answer to the command?

$ /usr/lib/nagios/plugins/check_nrpe -H remote.host -c check_raid
OK: mdstat:[md0(54.81 MiB raid1):UU, md1(923.45 GiB raid1):UU]

@glensc
Copy link
Owner

glensc commented May 14, 2019

also, please link to that "very old reply"

@HHawk
Copy link
Author

HHawk commented May 14, 2019

Hi glensc,

Thank you for your kind reply. Appreciated!
I never suspected you to answer this quickly! Props.

I did follow the install tutorial, as you described in your first post. I downloaded the .zip file and unzipped it to the server. Anyways, I downloaded check_raid.pl (from the link you gave) the I re-did it once again. And ran on the remote server:

[root@remoteserver ~]# ./check_raid.pl -S
Updating file /etc/sudoers.d/check_raid
/etc/sudoers.d/check_raid.new.326651: parsed OK
/etc/sudoers.d/check_raid file not changed.
check_raid OK - sudoers not updated

So the rules are correctly? As they didn't change. So that's not it imho.

I also did it on the Nagios server (the result was a bit different though):
[root@nagioserver~]# ./check_raid.pl -S

/dev/mapper/control: open failed: Operation not permitted
Failure to communicate with kernel device-mapper driver.
Check that device-mapper is available in the kernel.
Incompatible libdevmapper 1.02.149-RHEL7 (2018-07-20) and kernel driver (unknown version).
Command failed.
/dev/mapper/control: open failed: Operation not permitted
Failure to communicate with kernel device-mapper driver.
Check that device-mapper is available in the kernel.
Incompatible libdevmapper 1.02.149-RHEL7 (2018-07-20) and kernel driver (unknown version).
Command failed.
Your configuration does not need to use sudo, sudoers not updated
check_raid OK - sudoers not updated

The Nagios server is running on CentoS 7.6 with latest updates.

I entered the command once again to check the remote server:
/usr/local/nagios/libexec/check_nrpe -H 192.168.0.2 -c check_raid
NRPE: Unable to read output

But executing the command on the server itself, it's working great.

In regard to the very old reply: I had to dig for the old installation post, but I found it here: #147

I have no clue why it's going wrong though.

Also I ran: rpm -ivh https://github.com/glensc/nagios-plugin-check_raid/releases/download/4.0.9/nagios-
plugin-check_raid-4.0.9-1.noarch.rpm

Retrieving https://github.com/glensc/nagios-plugin-check_raid/releases/download/4.0.9/nagios-plugin-check_raid-4.0.9-1.noarch.rpm
error: Failed dependencies:
        /usr/lib/nagios/plugins is needed by nagios-plugin-check_raid-4.0.9-1.noarch
        perl(App::Monitoring::Plugin::CheckRaid) is needed by nagios-plugin-check_raid-4.0.9-1.noarch
        perl(App::Monitoring::Plugin::CheckRaid::Plugin) is needed by nagios-plugin-check_raid-4.0.9-1.noarch
        perl(App::Monitoring::Plugin::CheckRaid::Plugins::hpacucli) is needed by nagios-plugin-check_raid-4.0.9-1.noarch
        perl(App::Monitoring::Plugin::CheckRaid::Plugins::lsscsi) is needed by nagios-plugin-check_raid-4.0.9-1.noarch
        perl(App::Monitoring::Plugin::CheckRaid::Plugins::smartctl) is needed by nagios-plugin-check_raid-4.0.9-1.noarch
        perl(App::Monitoring::Plugin::CheckRaid::SerialLine) is needed by nagios-plugin-check_raid-4.0.9-1.noarch
        perl(App::Monitoring::Plugin::CheckRaid::Sudoers) is needed by nagios-plugin-check_raid-4.0.9-1.noarch
        perl(App::Monitoring::Plugin::CheckRaid::Utils) is needed by nagios-plugin-check_raid-4.0.9-1.noarch
        perl(Module::Pluggable) >= 5.1 is needed by nagios-plugin-check_raid-4.0.9-1.noarch
        perl(Monitoring::Plugin) >= 0.37 is needed by nagios-plugin-check_raid-4.0.9-1.noarch

Probably shouldn't be using it. But I have the latest Module::Pluggable & Monitoring::Plugin Perl modules? See below:

/usr/local/bin/cpanm Module::Pluggable
Module::Pluggable is up to date. (5.2)

/usr/local/bin/cpanm Monitoring::Plugin
Monitoring::Plugin is up to date. (0.40)

So really no clue...?

@HHawk
Copy link
Author

HHawk commented May 14, 2019

Update! Made some progress. For some reason check_raid was named check_raid.sh on the remote server.
When running the command:

/usr/local/nagios/libexec/check_nrpe -H 192.168.0.2 -c check_raid
results in:
CRITICAL: megacli:[Volumes(0): ; Devices(0): ]

Still not the result it should display, but at least it's showing something now.

//edit

Selinux is disabled on both servers by the way.

@glensc
Copy link
Owner

glensc commented May 14, 2019

unclear from the messages, does check_raid.pl (the all-in-one version from downloads) work for you:

  1. on remote server as root?
  2. on remote server as nagios user (sudo to it)?
  3. on remote server via nrpe?

you should troubleshoot in that order

@HHawk
Copy link
Author

HHawk commented May 14, 2019

I just redid everything once again. Also making sure I am using the correct version: 4.0.9 (also tried the dev version, but same result).

1 On remote server as root: OK: megacli:[Volumes(1): DISK0.0:Optimal; Devices(2): 02,01=Online]

2a On remote server: su nagios results in: This account is currently not available.

2b Enabled the account, logged in as nagios and ran the command: OK: megacli:[Volumes(1): DISK0.0:Optimal; Devices(2): 02,01=Online]

3 I don't know how to test this? I don't have the command check_nrpe on the remote server?

I installed several other plugins including "Check Linux Stats", but all of them are working correctly? Very strange. :(

@HHawk
Copy link
Author

HHawk commented May 14, 2019

I am gonna reboot both servers. Doubt it will help, but I am going crazy here. Trying now everything. :(

@HHawk
Copy link
Author

HHawk commented May 14, 2019

Running the command through SSH from the Nagios server to the remote server also works:
ssh -t 192.168.0.2 /usr/lib64/nagios/plugins/check_raid.pl -p megacli
OK: megacli:[Volumes(1): DISK0.0:Optimal; Devices(2): 02,01=Online]
Connection to 192.168.0.2 closed.

Output of /etc/sudoers.d/check_raid on the remote server:

User_Alias CHECK_RAID=nagios, icinga, sensu
Defaults:CHECK_RAID !requiretty
CHECK_RAID ALL=(root) NOPASSWD: /sbin/MegaCli -PDList -aALL -NoLog
CHECK_RAID ALL=(root) NOPASSWD: /sbin/MegaCli -LdInfo -Lall -aALL -NoLog
CHECK_RAID ALL=(root) NOPASSWD: /sbin/MegaCli -AdpBbuCmd -GetBbuStatus -aALL -NoLog

@HHawk
Copy link
Author

HHawk commented May 14, 2019

I am gonna redo it again on a different server. Doubt it will help. But worth a shot.
Will post back later. If you have any idea what to try next, let me know. Thank you.

@glensc
Copy link
Owner

glensc commented May 14, 2019

regarding testing over nrpe:

you already setup nrpe:

so, run the nrpe command:

@HHawk
Copy link
Author

HHawk commented May 14, 2019

Okay... I installed your script on a completely new server. But exactly the same issue.

./check_raid.pl -V
check_raid 4.0.9

./check_raid.pl -S
Your configuration does not need to use sudo, sudoers not updated
check_raid OK - sudoers not updated

On the remote server:

/usr/lib64/nagios/plugins/check_raid
OK: megacli:[Volumes(1): DISK0.0:Optimal; Devices(5): 04=Hotspare 00,01,02,03=Online]

From the Nagios server:

/usr/local/nagios/libexec/check_nrpe -H 192.168.0.101 -c check_raid
CRITICAL: megacli:[Volumes(0): ; Devices(0): ]

Yes nrpe is installed on the remote server, however there is no "check_nrpe" command. And I don't know how to run "nrpe" by itself on the remote server. Using the command "nrpe -V" gives:

NRPE - Nagios Remote Plugin Executor Version: 3.2.1

On the remote server. And I already have run the command a ton of times on the Nagios server, but no luck. That's why I created this ticket obviously:

/usr/local/nagios/libexec/check_nrpe -H 192.168.0.101 -c check_raid
CRITICAL: megacli:[Volumes(0): ; Devices(0): ]

So I have no idea on how to run check_nrpe on the remote server? Anyways, I will copy it over and try to run it that way. Will report back asap.

@HHawk
Copy link
Author

HHawk commented May 14, 2019

Copied over check_nrpe and run the command locally on the remote server with using check_nrpe:

/usr/lib64/nagios/plugins/check_nrpe -H 127.0.0.1 -c check_raid
CRITICAL: megacli:[Volumes(0): ; Devices(0): ]

:(

Also tried sudo. Same result.

@glensc
Copy link
Owner

glensc commented May 15, 2019

given that you get back megacli detected but incompletely, I guess there's some sudo execution problems. did you look logs in /var/log?

/usr/local/nagios/libexec/check_nrpe -H 192.168.0.101 -c check_raid
CRITICAL: megacli:[Volumes(0): ; Devices(0): ]

you can run check_raid with -d, it will show what commands it attempts to execute, repeat the same with root user, and with user that nrpe daemon runs.

and try individual commands as root and nagios user.

random guess is that megacli is in PATH that is accessible for root, but not for nagios user, or nagios user finds different megacli program.

and this time paste full commands and output to understand what user you are running. i.e sudo -u nagios /path/to/check_raid.pl -d

@HHawk
Copy link
Author

HHawk commented May 21, 2019

Sorry for the late reply; I was ill the past few days.

Anyways, in the end I got it fixed. The issue was that I needed to do the following on the remote server:

touch /etc/sudoers.d/nrpe
echo "Defaults:nrpe !requiretty" >> /etc/sudoers.d/nrpe
echo "nrpe ALL=(ALL) NOPASSWD: /usr/lib64/nagios/plugins/" >> /etc/sudoers.d/nrpe

I don't recall where I read it, it was on some website somwhere. But in the end these changes made it work for me finally.

Regards

@glensc
Copy link
Owner

glensc commented May 21, 2019

so, this contains information that you did not provide, even I asked repeatedly:

  • what is the user your nrpe daemon runs. answer is "nrpe", check_raid assumes it is "nagios"

given your previous /etc/sudoers.d/check_raid:

you can add nrpe to this list:

User_Alias CHECK_RAID=nagios, icinga, sensu, nrpe

that should make it work without the unknown changes you don't have even origin link.

if that's the case (user nrpe not handled), please include defails how you got your nrpe server installed. so could include that information to changelog.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants