-
Notifications
You must be signed in to change notification settings - Fork 23.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
SSH Error: Shared connection to x.x.x.x closed. #10616
Comments
I can't reproduce this as written (what version of Ansible were you using?), but I can with this:
This seems to be because of a race between the ssh command finishing and the ssh server being shut down. Looking at the source, it looks like I can get the same failure with 1.8.1, so probably something in your setup or Ansible has tipped the balance of the race condition for your task. Maybe https://support.ansible.com/hc/en-us/articles/201958037-Reboot-a-server-and-wait-for-it-to-come-back just needs to be changed to say |
Thanks for trying jder! I'm running version 1.9.0.1 (installed via brew), on either Mac OS X 10.10.2, and on a CentOS 7 server (installed via pip). One thing I omitted from my post was that I was only running this when a variable is set, so the full play looks like:
I didn't think that'd make a difference though. I also tried setting async: 1 but that did not resolve my problem. |
@darrylc Can you show a complete playbook & full output? (Maybe with the issue template?) I'm able to run your task without error with the same OS X and Ansible versions. Do you have a task after the reboot task? For example, this fails, regardless of the value of
Because even though the first task succeeds, the second one fails with the error you're seeing. |
Sure. This is the full contents of the playbook, but it's included around other things. If you'd like to see some of the other plays above/below, let me know.
Here's an output:
Previously, in 1.8.1, I got this output:
|
You might try running with |
The task before the reboot is:
Here's the verbose output of those two plays:
|
I should also note that I have this in my ssh config during the play:
So, the known hosts message shouldn't be a reason to abort the play. |
I was able to reproduce this with the following playbook:
Despite the async: 1 (or 0) and the ignore_errors, this produces the same ssh error you're seeing. I'm looking more into it. |
I think what's going on is that when you launch an async process on the remote host (with
|
hmm, so that seems to advance my scripts, but it doesn't seem to reboot my server. |
Ah, sorry, you'll probably need to ask your
|
Server is still not rebooting. The delay is happening, but I suspect that the 'sleep 2 && shutdown -r now "Ansible updates triggered"' command isn't actually working. |
it looks like running 'sleep 2 && shutdown -r now' via sudo requires a password. Running 'shutdown -r now' via sudo does not. Running 'sleep 2 && shutdown -r now' as root doesn't require a password either. I realize this might be outside the scope of Ansible, but any ideas? |
That's very strange. Perhaps your sudoers configuration is set up to only allow certain commands to be run without a password? |
Nope, sudo for that user (at that time) has full access |
Well, it was a long shot; I don't think Ansible does |
sleep by itself, or shutdown by itself, seems to work |
What happens if you just run it via Ansible synchronously?
|
Goes back to the original "fatal: [x.x.x.x] => SSH Error: Shared connection to x.x.x.x closed." |
So, that clearly reboots. But when you add the |
Hmm, might be working now. I changed it from a command: task to a shell: task. I hadn't noticed the discrepancy between our plays until now. I'll continue to test. |
Yes, seems to be working! Thanks so much jder! Here's the playbook:
|
Great! Glad to hear it.
|
Hi ... can understand why it is not Waiting.... |
I know that this ticket is closed, but I thought that I'd add my solution for any Googlers out there. Instead of:
I used:
|
@darrylc Long story short, I'd rather try (you don't really need sudo to sleep for two seconds): |
same problem in ansible-1.9.1,don't know why there should be an sleep 2 |
shutdown -r +2? the wait will be built in On Thu, May 21, 2015 at 12:58 AM, ZhiMing Zhang notifications@github.com
Brian Coca |
For posterity, the reason I wasn't using |
no ... this is the command by default for linux dist :) its in minus use sleep 2 && shutdown -r |
ah, did not realize the +# was in minutes ... too used to the current fast On Thu, May 21, 2015 at 12:16 PM, noamgr notifications@github.com wrote:
Brian Coca |
same error |
reboot system ,error lost ; by version 1.91 |
Hi, I am running into the same issue and I have no success in a few days. Any help is appreciated. "fatal: [54.184.91.116]: FAILED! => { |
Have you tried my answer above?
|
It won't work since 1.9.1 I guess. But this one will still work:
|
this is known issue i think it will fix in 2.0 ... i am using |
how could I reboot the server using this technique with ad-hoc command line? I've tried the bellow command but no sucess even if It shows success message. The VM isn't rebooted... $ ansible tag_v3_api_update_True -B 1 -P 0 -b -v -i ec2.py -a 'sleep 2 && shutdown -r now "Ansible updates triggered"'
background launch...
54.94.167.25 | success >> {
"ansible_job_id": "601411637602.2026",
"results_file": "/root/.ansible_async/601411637602.2026",
"started": 1
} |
My mistake... this is the correct way: ansible tag_v3_api_update_True -B 1 -P 0 -b -v -i ec2.py -m shell -a 'sleep 2 && shutdown -r now "Ansible updates triggered"' Now it works |
Delete the files from the ansibile server from where you are running the commands, cd $HOME/.ansible/cp/ rm -rf |
- Applying: ansible/ansible#10616 - Not tested
Hello, It clearly depends on the operating system. Moving from Debian 7 (Wheezy) to 8 (Jessie) on a Vagrant box shown this issue. I've upgraded from Ansible 1.9.4 to 2.0.0.2 and I get the exact same behaviour on both versions. If I had to guess, I'd say the order the services are stopped on has been changed and it's causing this. |
Anyway we can gather the facts again after the machine reboots and we wait for it to come back? Once the machine reboots, i need to gather the facts again before moving on with the next plays. Any thoughts? |
I am now seeing this with 1.9.4 but not with shell. This does not always fail, but often enough.
|
I had a task that worked on ansible < 2.1.0.0 start to fail in this way with 2.1.0.0. The solution from @jder above worked for me. Old task: - name: 'Restarting host machine(s) (Shows errors - OK to ignore!)'
command: shutdown -r now
async: 0
poll: 0
ignore_errors: true
become: yes new version that works with 2.1.0.0: - name: 'Restarting host machine(s) (Shows errors - OK to ignore!)'
shell: sleep 2 && shutdown -r now
async: 1
poll: 0
ignore_errors: true
become: yes |
I think the most graceful way to wait for a reboot to complete would be to take a note of the boot time prior to issuing a reboot, and wait for the boot-time to change. Some systems take a long time to complete shutdown, and assuming that shutdown takes less than a couple of minutes before SSH becomes unreachable is asking for problems, Not sure how to code that in Ansible yet. I wonder how portable the command 'uptime --since' is in the Linux world... doesn't appear to be present in RHEL5, RHEL6, and makes an appearance in RHEL at RHEL7. At least the following should be highly portable on Linux systems:
or if you want something more human readable
or something with ISO8601 date format, as an ad-hoc shell command
Output is like: 16-06-24T14:58:05 |
It seems like there's enough complexity here for a There's a windows one, contributed by @nitzmahone here: #15314, shipping with 2.1, docs here: https://github.com/ansible/ansible-modules-core/pull/3376/files - and an issue request for a cross platform module here: #16186 |
@cameronkerrnz I went for a combination in win_reboot- I actually added the ansible_lastboot fact to Windows for that exact purpose, but ended up not shipping the version of win_reboot that used it for various reasons (instead waiting for the port to go down, then back up, then for a "canary" command to succeed over WinRM). I might still revisit that at some point. At least on WinRM, it's expensive to wait for the connect timeouts and stuff if the port's not open and responding- ssh might be a little cheaper there, but hard to tell without just trying (plus some systems refuse connections until SSH is ready, others will accept and tarpit/block). I think the only real question outstanding is how generic we want this to be? E.g., do we want to have a python dep on the client or just make it purely ssh/shell/command-based, so it can potentially be used on things like switches/routers/embedded devices? I'd kinda lean toward the latter (where you could override the command that gets sent for your platform of choice), but that might somewhat limit some of the other behaviors (eg, calculating "yes, we actually rebooted" by sampling an uptime cmd, etc). |
@nitzmahone As a useful point of self-imposed abstraction to make later room for switches/etc., might I suggest the module be something like posix_reboot? Presumably, all such platforms would have to have Python 2.4 anyway, no? One thing to keep in mind, which helps to limit the scope of what this module could reasonably achieve, would be that noting/comparing the boot-time is in itself only useful for answering question "have we rebooted yet [or is something still taking ages to shut everything down]". I'm not convinced there is a good way of answering the question "are we ready to resume our play yet"... at least not without some knowledge of the platform deployment; so that should reasonably be left up to the play writer... eg. In RHEL6 or any other SysV init system, even if you can SSH into machine while it is booting, it might still be busy running fsck ...but I suppose if you have a But certainly you can expect to have to have something that implements multiple strategies, like the |
This workaround solved the issue on my Fedora 24 with ansible-2.2.0.0 managing a CentOS 7.3 virtual machine after a yum update. Using the shell module is fundamental because of the && shell builtin.
|
it has solved my problem! thank you! |
With "raw" module (when python is not installed or usable) this sleep hack doesn't work and async is also not usable. I tried a lot of different command lines and only one that finally worked was:
|
What is the source of this problem? |
While attempting to do a reboot, using a playbook containing:
I get the following error, and the playbook ends:
In previous versions (i.e. 1.8.1), the above playbook continues, ignoring the error.
The text was updated successfully, but these errors were encountered: