Reboot broken on Ubuntu 16.04 hosts #1488

hyperknot · 2016-07-18T15:08:36Z

The built in reboot() function, which has been working perfectly both on Ubuntu 14.04 and FreeBSD 10.x hosts, but is broken on Ubuntu 16.04 hosts.

What is happening on Ubuntu 14.04:
I receive an output like this and the system reboots, after the reboot Fabric reconnects.

[ubuntu] out:
[ubuntu] out:
[ubuntu] out: Broadcast message from root@ubuntu
[ubuntu] out:
[ubuntu] out:   (/dev/pts/0) at 15:02 ...
[ubuntu] out:
[ubuntu] out:
[ubuntu] out:
[ubuntu] out:
[ubuntu] out: The system is going down for reboot NOW!
[ubuntu] out:
[ubuntu] out:

What is happening on Ubuntu 16.04:

There is no output at all from the command.
The system actually starts rebooting (still no output in Fabric)
The system finishes reboot, but Fabric doesn't realise it, it does not reconnect, still no output.
Fabric just sits there waiting seemingly forever.

If I press the enter key in this state, Fabric actually continues, but shows this message before:

No handlers could be found for logger "paramiko.transport"
Warning: sudo() received nonzero return code -1 while executing 'reboot'!

I am using this code for reboot:

def reboot_():
    with settings(warn_only=True):
        print 'rebooting'
        start_time = time.time()
        reboot(wait=1200)
        print 'reboot took: {} seconds'.format(time.time() - start_time)

The text was updated successfully, but these errors were encountered:

hyperknot · 2016-07-18T15:11:15Z

It is exactly the same with run('reboot')

bitprophet · 2016-07-19T23:52:02Z

It being the same with a manual run is unsurprising - clearly something changed regarding Ubuntu's handling of reboot, SSH connections, etc.

Nothing obvious springs to mind, but reboot() (Fab's, not Linux's) is pretty basic - it simply calls sudo('reboot'), and temporarily tweaks Fabric's general reconnection settings so it can handle reconnecting after a nontrivial reboot sequence (versus the default, which would give up pretty quickly).

See

fabric/fabric/operations.py

Line 1244 in c0224a5

def reboot(wait=120, command='reboot', use_sudo=True):

- you might want to try tweaking that.

Also try enabling Paramiko's logging (see bottom of our troubleshooting page - http://www.fabfile.org/troubleshooting.html) as it might yield a clue.

bitprophet · 2016-07-19T23:59:05Z

Actually, on second thought, it sounds like Ubuntu's reboot is somehow never exiting or submitting an exit code to Fabric's execution handlers (run/sudo), since you note that sudo is what gets mad when you mash Enter after waiting.

If you look at the reboot() code, it expects the sudo('reboot') call to exit eventually, so that it can A) wait a bit and B) initiate reconnection.

The fact that, on Fabric's end, execution is just hanging out within the sudo means something remotely is violating that expectation. Kind of strange. Maybe a bug in Fabric itself, but feels more like bad behavior on the remote end. (P.S.: which fabric version(s) are you seeing this on?)

Offhand thought - we could perhaps set timeout= on the sudo, then except TimeoutException: pass around it. This would ensure that even in this (strange) situation, we default to trying a reconnect.

Only downside would be the case where reboot is actually hanging and the system is not truly rebooting, but it's not like we'd make things any worse for that case by the above change - the infinite hang would just happen on the connection loop instead of within the sudo.

hyperknot · 2016-07-20T21:05:33Z

An other really strange, changed behaviour in Ubuntu 16.04 is the following. When I run poweroff in an ssh session, the machine does power off, but the SSH sessions hangs! There is no way to Ctrl + C, or Ctrl + D, or anything. All I can do is wait a lot then ssh aborts with:
packet_write_wait: Connection to 192.168.56.11: Broken pipe

I'm really not into the deep pockets of SSH connection handling, but this might be the exactly the same issue as with reboot.

fillest · 2016-09-06T14:51:51Z

I've just run into broken reboot (fresh up-to-date Ubuntu 16.04 on AWS, Fabric==1.12.0) but in a different way. For me it just throws:

Fatal error: sudo() received nonzero return code -1 while executing!

Requested: reboot
Executed: sudo -S -p 'sudo password:'  /bin/bash -l -c "reboot"

Running sudo reboot in terminal by hand works (host reboots).

fillest · 2016-09-06T15:49:42Z

May be worth noting:

$ readlink /sbin/reboot 
/bin/systemctl
$ readlink /sbin/shutdown
/bin/systemctl

fillest · 2016-09-06T16:46:49Z

And another weird thing. I've changed the rebooting code to use aws-cli and after its call (which takes ~1sec, seems like it's asynchronous) I run sudo('add-apt-repository --yes ppa:nginx/stable'). It has always worked, but now after reboot it returned -1 too:

sudo: add-apt-repository --yes ppa:nginx/stable

Fatal error: sudo() received nonzero return code -1 while executing!

Requested: add-apt-repository --yes ppa:nginx/stable
Executed: sudo -S -p 'sudo password:'  /bin/bash -l -c "add-apt-repository --yes ppa:nginx/stable"

Then I tried to make fabric to reconnect by adding fabric.network.disconnect_all(). It resulted in requesting a password (why??):

[...] sudo: add-apt-repository --yes ppa:nginx/stable
[...] Login password for 'ubuntu':

And it started to work only after I added e.g. time.sleep(60 * 3) after reboot. Which is obviously a poor band-aid, and now I'm puzzled how to properly handle the password problem. Looks like it's related to this issue.

ploxiln · 2016-10-04T00:21:21Z

The problem seems to be that "reboot" is now sometimes "too fast", before the status of the command gets back over the ssh connection.

(Tip: If you're at a frozen ssh connection as a result: type \n~. aka enter, tilde, period. That's the default ssh escape character, then the disconnect command for ssh. If you just try ctrl-c or ctrl-d, ssh tries to pass that to the process running on the other side.)

One solution is to use shutdown -r +1, which will schedule the reboot for the next minute, and then wait a minute for it to start, and then start trying to re-connect. Admittedly, waiting a minute is not great.

A hacky thing to try: shutdown -r +0 should be equivalent to reboot, but in my limited tests of Ubuntu-16.04 running in VirtualBox, it tends to give a fraction of a second longer, showing the next shell prompt just before disconnecting a manual ssh session.

ploxiln · 2016-10-04T19:42:41Z

this is probably a dup of #1444

palbee · 2016-11-01T16:04:42Z

If the init daemon is switched to upstart reboot works as expected. It looks like systemd is killing sshd immediately.

alexkiousis · 2016-11-11T00:12:35Z

There was a bug on the Debian/Ubuntu's package of systemd that, on shutdown, killed the network service before the SSH one so everything hang.
It was fixed on the latest point release. Don't know about the Ubuntu package status.

https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=751636

hyperknot · 2016-11-26T14:20:50Z

Reported the bug for Ubuntu:
https://bugs.launchpad.net/ubuntu/+source/openssh/+bug/1645002

stefan-wegener · 2016-12-05T11:10:24Z

I also had issues regarding the usage of reboot() in some of my scripts. I found out that when connecting with a password, the reboot was working correctly, but when using keyfile-authentication, the connection hung up (an the reboot was done).

ploxiln · 2017-02-04T00:40:31Z

The ubuntu bug https://bugs.launchpad.net/ubuntu/+source/openssh/+bug/1645002 is marked as fixed in 16.10, but not yet in 16.04, and unclear when it will be.

The current behavior for me is that paramiko/fabric instantly detect that the ssh connection was closed, but it's before paramiko/fabric sees the reboot command to have completed. At least it doesn't hang indefinitely as in the original report.

Fatal error: sudo() received nonzero return code -1 while executing!
...
Aborting.

Plain reboot() did that consistently for me in a handful of tests against AWS EC2 and a local virtualbox VM. (I always used keyfile auth.)

I've found a short and elegant workaround, as I suggested without as much detail above:

reboot(command="shutdown -r +0")

That worked as expected for me (in my handful of tests against AWS EC2 and local virtualbox VM, all running up-to-date ubuntu 16.04). Note that "shutdown -r now" behaved like "reboot" and did not seem to work.

I took a quick look at the freebsd and openbsd man pages, and it looks they have a shutdown command that supports those parameters. I suspect that the command "shutdown -r +0" would work for pretty much any unix system which "reboot" worked on. So it could be considered for changing the default command, or updating the documentation. (But I'd be interested to see a report of a test on a BSD system first.)

ambsw-technology · 2017-07-05T19:57:18Z

shutdown -r +0 isn't enough for us. Since reboot doesn't accept a manual timeout, I've even tried something like:

try:
    sudo("shutdown -r +0", timeout=300)
except NetworkError:
    pass
# in case the sudo times out during reboot
sleep(15)

Despite all of this hand waving, the next command hangs indefinitely. Is it possible that the connection pool is holding onto (and using) the dead connection? If so, is there a workaround? Can I temporarily reduce the connection-level timeout?

ploxiln · 2017-07-05T20:05:30Z

Indeed, you need to replace the existing connection, the way reboot() does:

https://github.com/fabric/fabric/blob/1.13.2/fabric/operations.py#L1289-L1294

ecnepsnai · 2017-12-11T19:42:10Z

Apologies to revive an old issue, I can also confirm that this problem happens when attempting to reboot a LXC container. @ploxiln's suggestion of using command="shutdown -r +0" did work for us.

tehfink · 2018-02-07T09:14:04Z

Confirming this error on a fresh install of FreeBSD 11.1 with bash installed:

reboot(wait=1) results in:

Fatal error: sudo() received nonzero return code -1 while executing!

Requested: reboot
Executed: sudo -S -p 'sudo password:'  /usr/local/bin/bash -l -c "reboot"

Aborting.
Traceback (most recent call last):
…
    raise env.abort_exception(msg)
hosts.FabricException: sudo() received nonzero return code -1 while executing!

aggieNick02 · 2019-02-07T17:45:42Z

I ended up needing this to get things going after reeding @ambsw-technology and @ploxiln comments. I'm running against an ubuntu 16.04 LTS server (from a windows client).

sudo('shutdown -r +0')
time.sleep(30)
fabric.state.connections.connect(env.host_string)

aggieNick02 · 2019-06-17T18:05:45Z

FYI, I still see this against 18.04.2 LTS servers.

cgd1 · 2020-04-23T12:30:26Z

Any fix for this? also getting issue with 16.04

bitprophet added Core Bug Needs investigation labels Jul 19, 2016

bitprophet mentioned this issue Nov 29, 2016

sudo("reboot") hangs against local KVM #1444

Closed

ploxiln mentioned this issue Apr 14, 2017

Strange behaviour with host behind jump box (ProxyCommand ssh -W) #1588

Open

ploxiln mentioned this issue Jul 22, 2017

How to reboot with parallel #1635

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Reboot broken on Ubuntu 16.04 hosts #1488

Reboot broken on Ubuntu 16.04 hosts #1488

hyperknot commented Jul 18, 2016 •

edited

hyperknot commented Jul 18, 2016

bitprophet commented Jul 19, 2016 •

edited

bitprophet commented Jul 19, 2016 •

edited

hyperknot commented Jul 20, 2016

fillest commented Sep 6, 2016

fillest commented Sep 6, 2016 •

edited

fillest commented Sep 6, 2016 •

edited

ploxiln commented Oct 4, 2016

ploxiln commented Oct 4, 2016

palbee commented Nov 1, 2016

alexkiousis commented Nov 11, 2016

hyperknot commented Nov 26, 2016

stefan-wegener commented Dec 5, 2016

ploxiln commented Feb 4, 2017

ambsw-technology commented Jul 5, 2017 •

edited

ploxiln commented Jul 5, 2017 •

edited

ecnepsnai commented Dec 11, 2017

tehfink commented Feb 7, 2018

aggieNick02 commented Feb 7, 2019 •

edited

aggieNick02 commented Jun 17, 2019

cgd1 commented Apr 23, 2020

Reboot broken on Ubuntu 16.04 hosts #1488

Reboot broken on Ubuntu 16.04 hosts #1488

Comments

hyperknot commented Jul 18, 2016 • edited

hyperknot commented Jul 18, 2016

bitprophet commented Jul 19, 2016 • edited

bitprophet commented Jul 19, 2016 • edited

hyperknot commented Jul 20, 2016

fillest commented Sep 6, 2016

fillest commented Sep 6, 2016 • edited

fillest commented Sep 6, 2016 • edited

ploxiln commented Oct 4, 2016

ploxiln commented Oct 4, 2016

palbee commented Nov 1, 2016

alexkiousis commented Nov 11, 2016

hyperknot commented Nov 26, 2016

stefan-wegener commented Dec 5, 2016

ploxiln commented Feb 4, 2017

ambsw-technology commented Jul 5, 2017 • edited

ploxiln commented Jul 5, 2017 • edited

ecnepsnai commented Dec 11, 2017

tehfink commented Feb 7, 2018

aggieNick02 commented Feb 7, 2019 • edited

aggieNick02 commented Jun 17, 2019

cgd1 commented Apr 23, 2020

hyperknot commented Jul 18, 2016 •

edited

bitprophet commented Jul 19, 2016 •

edited

bitprophet commented Jul 19, 2016 •

edited

fillest commented Sep 6, 2016 •

edited

fillest commented Sep 6, 2016 •

edited

ambsw-technology commented Jul 5, 2017 •

edited

ploxiln commented Jul 5, 2017 •

edited

aggieNick02 commented Feb 7, 2019 •

edited