Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use timeout in connect function, #85

Draft
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

dvoryanchikov
Copy link

@dvoryanchikov dvoryanchikov commented Jan 9, 2014

avoiding sigpipe in case one of remote monitor systems is unreachable that causing loss all other communications, maybe later should make it optional config parameter.

Example.

Monitoring 3 upses in upsmon.conf:

MONITOR remote@192.168.40.4 0 foo bar slave
MONITOR myups@localhost 0 user pass master
MONITOR snmp@localhost 0 user pass slave

MINSUPPLIES 0
SHUTDOWNCMD "/bin/true"
POLLFREQ 5
POLLFREQALERT 5
HOSTSYNC 15
DEADTIME 15
POWERDOWNFLAG /etc/killpower

NOTIFYMSG ONLINE "Electricity OK at %s"
NOTIFYMSG ONBATT "Electricity Problem at %s"
NOTIFYMSG COMMBAD "Electricity Unknown at %s"
NOTIFYMSG COMMOK "Electricity Recovery at %s"
NOTIFYFLAG ONLINE SYSLOG+EXEC
NOTIFYFLAG ONBATT SYSLOG+EXEC
NOTIFYFLAG COMMBAD SYSLOG+EXEC
NOTIFYFLAG COMMOK SYSLOG+EXEC

RBWARNTIME 43200
NOCOMMWARNTIME 3600
FINALDELAY 5

Before patch (waiting for default kernel timeouts and tcp_syn_retries, blocking all work for 190 seconds and finally got COMMBAD for all upses):

  55.088474 pollups: remote@192.168.40.4
  55.088526 get_var: remote@192.168.40.4 / status
  60.088024 Poll UPS [remote@192.168.40.4] failed - Server disconnected
  60.088067 do_notify: ntype 0x0005 (COMMBAD)
  60.088084 Electricity Unknown at remote@192.168.40.4
  60.088339 Dropping connection to UPS [remote@192.168.40.4]
  60.088451 pollups: myups@localhost
  60.088488 get_var: myups@localhost / status
  60.088740 parse_status: [OL]
  60.088762 parsing: [OL]
  60.088775 pollups: snmp@localhost
  60.088787 get_var: snmp@localhost / status
  60.088897 parse_status: [OL]
  60.088918 parsing: [OL]
  60.089287 Current power value: 0
  60.089317 Minimum power value: 0
  65.089355 Trying to connect to UPS [remote@192.168.40.4]
 254.088019 UPS [remote@192.168.40.4]: connect failed: Connection failure: Connection timed out
 254.088069 pollups: myups@localhost
 254.088089 get_var: myups@localhost / status
 254.088190 SIGPIPE: dazed and confused, but continuing...
 254.088235 Poll UPS [myups@localhost] failed - Write error: Broken pipe
 254.088252 do_notify: ntype 0x0005 (COMMBAD)
 254.088265 Electricity Unknown at myups@localhost
 254.088476 Dropping connection to UPS [myups@localhost]
 254.088524 pollups: snmp@localhost
 254.088542 get_var: snmp@localhost / status
 254.088608 SIGPIPE: dazed and confused, but continuing...
 254.088656 Poll UPS [snmp@localhost] failed - Write error: Broken pipe
 254.088673 do_notify: ntype 0x0005 (COMMBAD)
 254.088688 Electricity Unknown at snmp@localhost
 254.089331 Dropping connection to UPS [snmp@localhost]
 254.091799 Current power value: 0
 254.091918 Minimum power value: 0
 259.092149 Trying to connect to UPS [remote@192.168.40.4]

After patch (Pause 10 seconds on unreachable entry, other connections still alive):

  20.045464 pollups: remote@192.168.40.4
  20.045523 get_var: remote@192.168.40.4 / status
  25.042222 Poll UPS [remote@192.168.40.4] failed - Server disconnected
  25.042285 do_notify: ntype 0x0005 (COMMBAD)
  25.042302 Electricity Unknown at remote@192.168.40.4
  25.042513 Dropping connection to UPS [remote@192.168.40.4]
  25.042572 pollups: myups@localhost
  25.042650 get_var: myups@localhost / status
  25.042869 parse_status: [OL]
  25.042908 parsing: [OL]
  25.042972 pollups: snmp@localhost
  25.043004 get_var: snmp@localhost / status
  25.043139 parse_status: [OL]
  25.043155 parsing: [OL]
  25.043641 Current power value: 0
  25.043656 Minimum power value: 0
  30.043714 Trying to connect to UPS [remote@192.168.40.4]
  40.046171 UPS [remote@192.168.40.4]: connect failed: Unknown error
  40.046229 pollups: myups@localhost
  40.046254 get_var: myups@localhost / status
  40.046473 parse_status: [OL]
  40.046490 parsing: [OL]
  40.046508 pollups: snmp@localhost
  40.046524 get_var: snmp@localhost / status
  40.046641 parse_status: [OL]
  40.046657 parsing: [OL]
  40.046948 Current power value: 0
  40.046961 Minimum power value: 0
  45.047023 Trying to connect to UPS [remote@192.168.40.4]
  55.046134 UPS [remote@192.168.40.4]: connect failed: Unknown error
  55.046176 pollups: myups@localhost
  55.046202 get_var: myups@localhost / status
  55.046353 parse_status: [OL]
  55.046371 parsing: [OL]

So, if timeout is defined since 587d5f8, why not to use it?!
Patch tested and working.

…te monitor systems is unreachable that causing loss all other communications, maybe later should make it optional config parameter
@clepple
Copy link
Member

clepple commented Jan 10, 2014

Hmm, this exposes another problem: when using the timeout patch, it is not obvious that a timeout has occurred. At the very least, this feature should fix the "connect failed: Unknown error" message.

I'm not especially happy with the connect timeout code anyway - it was added to support nut-scanner. But in the long run, if we are changing effective TCP timeout behavior, that timeout should be configurable.

aquette pushed a commit to aquette/nut that referenced this pull request Jun 28, 2016
@aquette
Copy link
Member

aquette commented Mar 1, 2019

trying to resurrect the discussion:

@clepple @zykh @jimklimov thoughts and comments?

@jimklimov
Copy link
Member

jimklimov commented Mar 1, 2019

Just to be sure, this is orthogonal to rapid data stale-unstale sequences reported a few weeks ago (as #651 OTOH)?

jimklimov added a commit to jimklimov/nut that referenced this pull request Jun 7, 2019
Debian packaging: explicitly request libssl*-dev so OBS is not confus…
@jimklimov jimklimov added the outdated-CI PR was made without rights for NUT Team to update its source branch, proposed codebase has no/old CI label Oct 9, 2020
@jimklimov jimklimov removed the outdated-CI PR was made without rights for NUT Team to update its source branch, proposed codebase has no/old CI label Sep 15, 2021
@jimklimov jimklimov added this to the 2.8.1 milestone Apr 1, 2022
@jimklimov
Copy link
Member

Looking at this PR open for so long (and with useful discussion above), I think this particular change will not be merged due to reasons articulated above, but a similar feature (with configuration, docs and all) can be considered in a future release.

@jimklimov jimklimov marked this pull request as draft April 3, 2022 02:26
@jimklimov jimklimov modified the milestones: 2.8.1, 2.8.2 Sep 14, 2023
@jimklimov jimklimov added enhancement service/daemon start/stop General subject for starting and stopping NUT daemons (drivers, server, monitor); also BG/FG/Debug upsmon labels Sep 14, 2023
@jimklimov jimklimov modified the milestones: 2.8.2, 2.8.3 Apr 6, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement service/daemon start/stop General subject for starting and stopping NUT daemons (drivers, server, monitor); also BG/FG/Debug upsmon
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants