Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SNMP retries not respected due to global timeout #76

Open
meni2029 opened this issue Aug 21, 2020 · 3 comments
Open

SNMP retries not respected due to global timeout #76

meni2029 opened this issue Aug 21, 2020 · 3 comments

Comments

@meni2029
Copy link

Hello,
I'm looking at check_snmp_storage.pl but this most likely applies to most if not all the plugins.
From the code, I see that the script is forced to end after the timeout (given in arguments, default 5):

if (defined($o_timeout)) {
    verb("Alarm in $o_timeout seconds");
    alarm($o_timeout);
}

$SIG{'ALRM'} = sub {
    print "No answer from host $o_host:$o_port\n";
    exit $ERRORS{"UNKNOWN"};
};

The snmp session has the same timeout value and a retries value of 10:

        ($session, $error) = Net::SNMP->session(
            -hostname  => $o_host,
            -version   => 2,
            -community => $o_community,
            -port      => $o_port,
            -retries   => 10,
            -timeout   => $o_timeout,
            -domain    => $o_domain
        );

From my understanding the retries can not be respected as the script will be forced to end after the first snmp attempt (same timeout for the script and the snmp)

Am I right ?

Expected Behavior

Script doesn't end before snmp retries are executed

Current Behavior

No snmp retries executed as the script will end after the snmp timeout of the first attempt

Possible Solution

One solution would be to calculate a global timeout as $o_timeout*10

if (defined($o_timeout)) {
    my $global_timeout = $o_timeout * 10;
    verb("Alarm in $global_timeout seconds");
    alarm($global_timeout);
}

$SIG{'ALRM'} = sub {
    print "No answer from host $o_host:$o_port\n";
    exit $ERRORS{"UNKNOWN"};
};

Context

On one monitored Linux host we are getting "No answer from host ip:161" from time to time.

@SteScho
Copy link
Owner

SteScho commented Aug 25, 2020

Hi

For your context: just increase the timeout or adjust the check intervals and the retry count in your monitoring system.

In general: I don't know why the retry option is even set for this check. In others this is missing. And yes, in terms of time, the check only makes one attempt. And I think that is enough. Checks should be done quickly. In my mind, I don't like to have to wait for the repetitions at this point.

@meni2029
Copy link
Author

Hi @SteScho, thanks for your prompt reply.

In general: My point about snmp retries is that in one run of the check there can be >10 snmp queries (depending on the number of storage partitions), and if one get lost then the whole check is failed (timeout). In the other hand I agree that checks should be done quickly, for sure 10 snmp retries is too much.

Our context: At the end I found out that our issue is not with lost snmp queries, but with a storage partition intermittently missing: i.e. when check_snmp_storage.pl runs, a partition disappears between get index_table and get of the storage values --> expected oids missing --> timeout. As a workaround we filtered the incriminated partition, as it is not an important one anyway.

You may close this issue.

@SteScho
Copy link
Owner

SteScho commented Aug 28, 2020

Hi.

How often do you have that situation? If it helps, it would be conceivable to create an option for the repetitions which is set to 1 by default. That fits my opinion it should be quick, but it helps you in your special situation. And of course it maybe helpful to others, too.

On the other hand, this is a feature that is missing so far. The default of 1 does not change the check behavior, but the additional option can add value for cases like yours. Sounds good.

So feel free to create a PR - I will merge it to the code.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants