Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SSH Error: Shared connection to x.x.x.x closed. #10616

Closed
darrylc opened this issue Apr 4, 2015 · 55 comments
Closed

SSH Error: Shared connection to x.x.x.x closed. #10616

darrylc opened this issue Apr 4, 2015 · 55 comments

Comments

@darrylc
Copy link

darrylc commented Apr 4, 2015

While attempting to do a reboot, using a playbook containing:

- name: Reboot
  command: shutdown -r now "Ansible updates triggered"
  async: 0
  poll: 0
  ignore_errors: true

I get the following error, and the playbook ends:

fatal: [x.x.x.x] => SSH Error: Shared connection to x.x.x.x closed.
It is sometimes useful to re-run the command using -vvvv, which prints SSH debug output to help diagnose the issue.

FATAL: all hosts have already failed -- aborting

In previous versions (i.e. 1.8.1), the above playbook continues, ignoring the error.

@jder
Copy link
Contributor

jder commented Apr 4, 2015

I can't reproduce this as written (what version of Ansible were you using?), but I can with this:

- name: Reboot
  shell: 'shutdown -r now "Ansible updates triggered" && sleep 10'
  async: 0
  poll: 0
  ignore_errors: true

This seems to be because of a race between the ssh command finishing and the ssh server being shut down. Looking at the source, it looks like async: 0 means "run synchronously", so it's not surprising this fails. If you change it to async: 1, it works fine for me.

I can get the same failure with 1.8.1, so probably something in your setup or Ansible has tipped the balance of the race condition for your task.

Maybe https://support.ansible.com/hc/en-us/articles/201958037-Reboot-a-server-and-wait-for-it-to-come-back just needs to be changed to say async: 1 rather than async: 0?

@darrylc
Copy link
Author

darrylc commented Apr 4, 2015

Thanks for trying jder! I'm running version 1.9.0.1 (installed via brew), on either Mac OS X 10.10.2, and on a CentOS 7 server (installed via pip). One thing I omitted from my post was that I was only running this when a variable is set, so the full play looks like:

- name: Reboot
  command: shutdown -r now "Ansible updates triggered"
  async: 0
  poll: 0
  ignore_errors: true
  when: runUpdates

I didn't think that'd make a difference though.

I also tried setting async: 1 but that did not resolve my problem.

@jder
Copy link
Contributor

jder commented Apr 4, 2015

@darrylc Can you show a complete playbook & full output? (Maybe with the issue template?)

I'm able to run your task without error with the same OS X and Ansible versions. Do you have a task after the reboot task? For example, this fails, regardless of the value of async:

---
- 
  hosts: all
  vars: 
    runUpdates: true
  tasks:
    - name: Reboot
      command: shutdown -r now "Ansible updates triggered"
      async: 1
      poll: 0
      ignore_errors: true
      when: runUpdates

    - name: After reboot
      shell: 'sleep 5 && echo hi'

Because even though the first task succeeds, the second one fails with the error you're seeing.

@darrylc
Copy link
Author

darrylc commented Apr 4, 2015

Sure. This is the full contents of the playbook, but it's included around other things. If you'd like to see some of the other plays above/below, let me know.

---
- name: Reboot
  command: shutdown -r now "Ansible updates triggered"
  async: 0
  poll: 0
  ignore_errors: true
  when: runUpdates

- name: Waiting for all app servers
  local_action: wait_for host={{ inventory_hostname }}
                port=22 state=started
  sudo: false
  when: runUpdates

Here's an output:

TASK: [common | Reboot] *******************************************************
fatal: [10.10.0.233] => SSH Error: Shared connection to 10.10.0.233 closed.
It is sometimes useful to re-run the command using -vvvv, which prints SSH debug output to help diagnose the issue.
fatal: [10.10.0.234] => SSH Error: Shared connection to 10.10.0.234 closed.
It is sometimes useful to re-run the command using -vvvv, which prints SSH debug output to help diagnose the issue.
fatal: [10.10.1.249] => SSH Error: Shared connection to 10.10.1.249 closed.
It is sometimes useful to re-run the command using -vvvv, which prints SSH debug output to help diagnose the issue.
fatal: [10.10.1.248] => SSH Error: Shared connection to 10.10.1.248 closed.
It is sometimes useful to re-run the command using -vvvv, which prints SSH debug output to help diagnose the issue.
fatal: [10.10.1.12] => SSH Error: Shared connection to 10.10.1.12 closed.
It is sometimes useful to re-run the command using -vvvv, which prints SSH debug output to help diagnose the issue.

FATAL: all hosts have already failed -- aborting

PLAY RECAP ********************************************************************
           to retry, use: --limit @/home/centos/site.retry

10.10.0.233                : ok=17   changed=16   unreachable=1    failed=0
10.10.0.234                : ok=17   changed=16   unreachable=1    failed=0
10.10.1.12                 : ok=17   changed=16   unreachable=1    failed=0
10.10.1.248                : ok=17   changed=16   unreachable=1    failed=0
10.10.1.249                : ok=17   changed=16   unreachable=1    failed=0
127.0.0.1                  : ok=25   changed=8    unreachable=0    failed=0

Previously, in 1.8.1, I got this output:

TASK: [nat | Reboot] **********************************************************
failed: [x.x.x.x] => {"failed": true, "parsed": false}
SUDO-SUCCESS-gaixedfvfwciqldgvvrcxtieejnprhbe

...ignoring

TASK: [nat | Waiting for all app servers] *************************************
ok: [x.x.x.x -> 127.0.0.1]

@jder
Copy link
Contributor

jder commented Apr 4, 2015

You might try running with -vvvv. I'd also be interested in the task & output that comes before the reboot task.

@darrylc
Copy link
Author

darrylc commented Apr 4, 2015

The task before the reboot is:

- name: Set SELINUX to Permissive
  selinux: state=permissive policy=targeted

Here's the verbose output of those two plays:

TASK: [common | Set SELINUX to Permissive] ************************************
<10.10.0.233> ESTABLISH CONNECTION FOR USER: centos
<10.10.0.233> REMOTE_MODULE selinux state=permissive policy=targeted
<10.10.0.233> EXEC ssh -C -tt -vvv -o ControlMaster=auto -o ControlPersist=60s -o ControlPath="/home/centos/.ansible/cp/ansible-ssh-%h-%p-%r" -o KbdInteractiveAuthentication=no -o PreferredAuthentications=gssapi-with-mic,gssapi-keyex,hostbased,publickey -o PasswordAuthentication=no -o ConnectTimeout=10 10.10.0.233 /bin/sh -c 'mkdir -p $HOME/.ansible/tmp/ansible-tmp-1428172087.36-264711243780015 && chmod a+rx $HOME/.ansible/tmp/ansible-tmp-1428172087.36-264711243780015 && echo $HOME/.ansible/tmp/ansible-tmp-1428172087.36-264711243780015'
EXEC previous known host file not found for 10.10.0.233
<10.10.0.233> PUT /tmp/tmpk3ODZU TO /home/centos/.ansible/tmp/ansible-tmp-1428172087.36-264711243780015/selinux
<10.10.0.234> ESTABLISH CONNECTION FOR USER: centos
<10.10.0.234> REMOTE_MODULE selinux state=permissive policy=targeted
<10.10.0.234> EXEC ssh -C -tt -vvv -o ControlMaster=auto -o ControlPersist=60s -o ControlPath="/home/centos/.ansible/cp/ansible-ssh-%h-%p-%r" -o KbdInteractiveAuthentication=no -o PreferredAuthentications=gssapi-with-mic,gssapi-keyex,hostbased,publickey -o PasswordAuthentication=no -o ConnectTimeout=10 10.10.0.234 /bin/sh -c 'mkdir -p $HOME/.ansible/tmp/ansible-tmp-1428172087.46-124848358438427 && chmod a+rx $HOME/.ansible/tmp/ansible-tmp-1428172087.46-124848358438427 && echo $HOME/.ansible/tmp/ansible-tmp-1428172087.46-124848358438427'
EXEC previous known host file not found for 10.10.0.234
<10.10.0.233> EXEC ssh -C -tt -vvv -o ControlMaster=auto -o ControlPersist=60s -o ControlPath="/home/centos/.ansible/cp/ansible-ssh-%h-%p-%r" -o KbdInteractiveAuthentication=no -o PreferredAuthentications=gssapi-with-mic,gssapi-keyex,hostbased,publickey -o PasswordAuthentication=no -o ConnectTimeout=10 10.10.0.233 /bin/sh -c 'sudo -k && sudo -H -S -p "[sudo via ansible, key=ekxspcioyrcyceijzspufsbujionstgk] password: " -u root /bin/sh -c '"'"'echo BECOME-SUCCESS-ekxspcioyrcyceijzspufsbujionstgk; LANG=en_US.UTF-8 LC_CTYPE=en_US.UTF-8 /usr/bin/python /home/centos/.ansible/tmp/ansible-tmp-1428172087.36-264711243780015/selinux; rm -rf /home/centos/.ansible/tmp/ansible-tmp-1428172087.36-264711243780015/ >/dev/null 2>&1'"'"''
EXEC previous known host file not found for 10.10.0.233
<10.10.0.234> PUT /tmp/tmpXup5iQ TO /home/centos/.ansible/tmp/ansible-tmp-1428172087.46-124848358438427/selinux
<10.10.1.249> ESTABLISH CONNECTION FOR USER: centos
<10.10.1.249> REMOTE_MODULE selinux state=permissive policy=targeted
<10.10.1.249> EXEC ssh -C -tt -vvv -o ControlMaster=auto -o ControlPersist=60s -o ControlPath="/home/centos/.ansible/cp/ansible-ssh-%h-%p-%r" -o KbdInteractiveAuthentication=no -o PreferredAuthentications=gssapi-with-mic,gssapi-keyex,hostbased,publickey -o PasswordAuthentication=no -o ConnectTimeout=10 10.10.1.249 /bin/sh -c 'mkdir -p $HOME/.ansible/tmp/ansible-tmp-1428172087.54-222918139378988 && chmod a+rx $HOME/.ansible/tmp/ansible-tmp-1428172087.54-222918139378988 && echo $HOME/.ansible/tmp/ansible-tmp-1428172087.54-222918139378988'
EXEC previous known host file not found for 10.10.1.249
<10.10.0.234> EXEC ssh -C -tt -vvv -o ControlMaster=auto -o ControlPersist=60s -o ControlPath="/home/centos/.ansible/cp/ansible-ssh-%h-%p-%r" -o KbdInteractiveAuthentication=no -o PreferredAuthentications=gssapi-with-mic,gssapi-keyex,hostbased,publickey -o PasswordAuthentication=no -o ConnectTimeout=10 10.10.0.234 /bin/sh -c 'sudo -k && sudo -H -S -p "[sudo via ansible, key=luojdoizsqkbbeebaqqjmmnqrcoisljv] password: " -u root /bin/sh -c '"'"'echo BECOME-SUCCESS-luojdoizsqkbbeebaqqjmmnqrcoisljv; LANG=en_US.UTF-8 LC_CTYPE=en_US.UTF-8 /usr/bin/python /home/centos/.ansible/tmp/ansible-tmp-1428172087.46-124848358438427/selinux; rm -rf /home/centos/.ansible/tmp/ansible-tmp-1428172087.46-124848358438427/ >/dev/null 2>&1'"'"''
EXEC previous known host file not found for 10.10.0.234
<10.10.1.249> PUT /tmp/tmpaUM1Vx TO /home/centos/.ansible/tmp/ansible-tmp-1428172087.54-222918139378988/selinux
<10.10.1.248> ESTABLISH CONNECTION FOR USER: centos
<10.10.1.248> REMOTE_MODULE selinux state=permissive policy=targeted
<10.10.1.248> EXEC ssh -C -tt -vvv -o ControlMaster=auto -o ControlPersist=60s -o ControlPath="/home/centos/.ansible/cp/ansible-ssh-%h-%p-%r" -o KbdInteractiveAuthentication=no -o PreferredAuthentications=gssapi-with-mic,gssapi-keyex,hostbased,publickey -o PasswordAuthentication=no -o ConnectTimeout=10 10.10.1.248 /bin/sh -c 'mkdir -p $HOME/.ansible/tmp/ansible-tmp-1428172087.63-68654424535862 && chmod a+rx $HOME/.ansible/tmp/ansible-tmp-1428172087.63-68654424535862 && echo $HOME/.ansible/tmp/ansible-tmp-1428172087.63-68654424535862'
EXEC previous known host file not found for 10.10.1.248
<10.10.1.249> EXEC ssh -C -tt -vvv -o ControlMaster=auto -o ControlPersist=60s -o ControlPath="/home/centos/.ansible/cp/ansible-ssh-%h-%p-%r" -o KbdInteractiveAuthentication=no -o PreferredAuthentications=gssapi-with-mic,gssapi-keyex,hostbased,publickey -o PasswordAuthentication=no -o ConnectTimeout=10 10.10.1.249 /bin/sh -c 'sudo -k && sudo -H -S -p "[sudo via ansible, key=meugdinapovzmnnawghagziymmulwrka] password: " -u root /bin/sh -c '"'"'echo BECOME-SUCCESS-meugdinapovzmnnawghagziymmulwrka; LANG=en_US.UTF-8 LC_CTYPE=en_US.UTF-8 /usr/bin/python /home/centos/.ansible/tmp/ansible-tmp-1428172087.54-222918139378988/selinux; rm -rf /home/centos/.ansible/tmp/ansible-tmp-1428172087.54-222918139378988/ >/dev/null 2>&1'"'"''
EXEC previous known host file not found for 10.10.1.249
<10.10.1.248> PUT /tmp/tmp_DUedY TO /home/centos/.ansible/tmp/ansible-tmp-1428172087.63-68654424535862/selinux
<10.10.1.12> ESTABLISH CONNECTION FOR USER: centos
<10.10.1.12> REMOTE_MODULE selinux state=permissive policy=targeted
<10.10.1.12> EXEC ssh -C -tt -vvv -o ControlMaster=auto -o ControlPersist=60s -o ControlPath="/home/centos/.ansible/cp/ansible-ssh-%h-%p-%r" -o KbdInteractiveAuthentication=no -o PreferredAuthentications=gssapi-with-mic,gssapi-keyex,hostbased,publickey -o PasswordAuthentication=no -o ConnectTimeout=10 10.10.1.12 /bin/sh -c 'mkdir -p $HOME/.ansible/tmp/ansible-tmp-1428172087.72-168887775441330 && chmod a+rx $HOME/.ansible/tmp/ansible-tmp-1428172087.72-168887775441330 && echo $HOME/.ansible/tmp/ansible-tmp-1428172087.72-168887775441330'
EXEC previous known host file not found for 10.10.1.12
<10.10.1.248> EXEC ssh -C -tt -vvv -o ControlMaster=auto -o ControlPersist=60s -o ControlPath="/home/centos/.ansible/cp/ansible-ssh-%h-%p-%r" -o KbdInteractiveAuthentication=no -o PreferredAuthentications=gssapi-with-mic,gssapi-keyex,hostbased,publickey -o PasswordAuthentication=no -o ConnectTimeout=10 10.10.1.248 /bin/sh -c 'sudo -k && sudo -H -S -p "[sudo via ansible, key=dqiddmrxmpjrqpokdjwrwlrskglpedth] password: " -u root /bin/sh -c '"'"'echo BECOME-SUCCESS-dqiddmrxmpjrqpokdjwrwlrskglpedth; LANG=en_US.UTF-8 LC_CTYPE=en_US.UTF-8 /usr/bin/python /home/centos/.ansible/tmp/ansible-tmp-1428172087.63-68654424535862/selinux; rm -rf /home/centos/.ansible/tmp/ansible-tmp-1428172087.63-68654424535862/ >/dev/null 2>&1'"'"''
EXEC previous known host file not found for 10.10.1.248
<10.10.1.12> PUT /tmp/tmpqoHJx2 TO /home/centos/.ansible/tmp/ansible-tmp-1428172087.72-168887775441330/selinux
<10.10.1.12> EXEC ssh -C -tt -vvv -o ControlMaster=auto -o ControlPersist=60s -o ControlPath="/home/centos/.ansible/cp/ansible-ssh-%h-%p-%r" -o KbdInteractiveAuthentication=no -o PreferredAuthentications=gssapi-with-mic,gssapi-keyex,hostbased,publickey -o PasswordAuthentication=no -o ConnectTimeout=10 10.10.1.12 /bin/sh -c 'sudo -k && sudo -H -S -p "[sudo via ansible, key=rvubjjqnipjqybarkttszpgpfruvaudr] password: " -u root /bin/sh -c '"'"'echo BECOME-SUCCESS-rvubjjqnipjqybarkttszpgpfruvaudr; LANG=en_US.UTF-8 LC_CTYPE=en_US.UTF-8 /usr/bin/python /home/centos/.ansible/tmp/ansible-tmp-1428172087.72-168887775441330/selinux; rm -rf /home/centos/.ansible/tmp/ansible-tmp-1428172087.72-168887775441330/ >/dev/null 2>&1'"'"''
EXEC previous known host file not found for 10.10.1.12
ok: [10.10.0.233] => {"changed": false, "configfile": "/etc/selinux/config", "msg": "", "policy": "targeted", "state": "permissive"}
ok: [10.10.0.234] => {"changed": false, "configfile": "/etc/selinux/config", "msg": "", "policy": "targeted", "state": "permissive"}
ok: [10.10.1.249] => {"changed": false, "configfile": "/etc/selinux/config", "msg": "", "policy": "targeted", "state": "permissive"}
ok: [10.10.1.248] => {"changed": false, "configfile": "/etc/selinux/config", "msg": "", "policy": "targeted", "state": "permissive"}
ok: [10.10.1.12] => {"changed": false, "configfile": "/etc/selinux/config", "msg": "", "policy": "targeted", "state": "permissive"}

TASK: [common | Reboot] *******************************************************
<10.10.0.233> ESTABLISH CONNECTION FOR USER: centos
<10.10.0.233> REMOTE_MODULE command shutdown -r now "Ansible updates triggered"
<10.10.0.233> EXEC ssh -C -tt -vvv -o ControlMaster=auto -o ControlPersist=60s -o ControlPath="/home/centos/.ansible/cp/ansible-ssh-%h-%p-%r" -o KbdInteractiveAuthentication=no -o PreferredAuthentications=gssapi-with-mic,gssapi-keyex,hostbased,publickey -o PasswordAuthentication=no -o ConnectTimeout=10 10.10.0.233 /bin/sh -c 'mkdir -p $HOME/.ansible/tmp/ansible-tmp-1428172088.59-107964187200155 && chmod a+rx $HOME/.ansible/tmp/ansible-tmp-1428172088.59-107964187200155 && echo $HOME/.ansible/tmp/ansible-tmp-1428172088.59-107964187200155'
EXEC previous known host file not found for 10.10.0.233
<10.10.0.233> PUT /tmp/tmprmK6QE TO /home/centos/.ansible/tmp/ansible-tmp-1428172088.59-107964187200155/command
<10.10.0.234> ESTABLISH CONNECTION FOR USER: centos
<10.10.0.234> REMOTE_MODULE command shutdown -r now "Ansible updates triggered"
<10.10.0.234> EXEC ssh -C -tt -vvv -o ControlMaster=auto -o ControlPersist=60s -o ControlPath="/home/centos/.ansible/cp/ansible-ssh-%h-%p-%r" -o KbdInteractiveAuthentication=no -o PreferredAuthentications=gssapi-with-mic,gssapi-keyex,hostbased,publickey -o PasswordAuthentication=no -o ConnectTimeout=10 10.10.0.234 /bin/sh -c 'mkdir -p $HOME/.ansible/tmp/ansible-tmp-1428172088.68-234873763936405 && chmod a+rx $HOME/.ansible/tmp/ansible-tmp-1428172088.68-234873763936405 && echo $HOME/.ansible/tmp/ansible-tmp-1428172088.68-234873763936405'
EXEC previous known host file not found for 10.10.0.234
<10.10.0.233> EXEC ssh -C -tt -vvv -o ControlMaster=auto -o ControlPersist=60s -o ControlPath="/home/centos/.ansible/cp/ansible-ssh-%h-%p-%r" -o KbdInteractiveAuthentication=no -o PreferredAuthentications=gssapi-with-mic,gssapi-keyex,hostbased,publickey -o PasswordAuthentication=no -o ConnectTimeout=10 10.10.0.233 /bin/sh -c 'sudo -k && sudo -H -S -p "[sudo via ansible, key=ghvlhadtwkcoouhxllakzdanqqrrhxox] password: " -u root /bin/sh -c '"'"'echo BECOME-SUCCESS-ghvlhadtwkcoouhxllakzdanqqrrhxox; LANG=en_US.UTF-8 LC_CTYPE=en_US.UTF-8 /usr/bin/python /home/centos/.ansible/tmp/ansible-tmp-1428172088.59-107964187200155/command; rm -rf /home/centos/.ansible/tmp/ansible-tmp-1428172088.59-107964187200155/ >/dev/null 2>&1'"'"''
EXEC previous known host file not found for 10.10.0.233
<10.10.0.234> PUT /tmp/tmpE5WSr6 TO /home/centos/.ansible/tmp/ansible-tmp-1428172088.68-234873763936405/command
<10.10.1.249> ESTABLISH CONNECTION FOR USER: centos
<10.10.1.249> REMOTE_MODULE command shutdown -r now "Ansible updates triggered"
<10.10.1.249> EXEC ssh -C -tt -vvv -o ControlMaster=auto -o ControlPersist=60s -o ControlPath="/home/centos/.ansible/cp/ansible-ssh-%h-%p-%r" -o KbdInteractiveAuthentication=no -o PreferredAuthentications=gssapi-with-mic,gssapi-keyex,hostbased,publickey -o PasswordAuthentication=no -o ConnectTimeout=10 10.10.1.249 /bin/sh -c 'mkdir -p $HOME/.ansible/tmp/ansible-tmp-1428172088.77-77338967532570 && chmod a+rx $HOME/.ansible/tmp/ansible-tmp-1428172088.77-77338967532570 && echo $HOME/.ansible/tmp/ansible-tmp-1428172088.77-77338967532570'
EXEC previous known host file not found for 10.10.1.249
<10.10.0.234> EXEC ssh -C -tt -vvv -o ControlMaster=auto -o ControlPersist=60s -o ControlPath="/home/centos/.ansible/cp/ansible-ssh-%h-%p-%r" -o KbdInteractiveAuthentication=no -o PreferredAuthentications=gssapi-with-mic,gssapi-keyex,hostbased,publickey -o PasswordAuthentication=no -o ConnectTimeout=10 10.10.0.234 /bin/sh -c 'sudo -k && sudo -H -S -p "[sudo via ansible, key=lfxcjguxnbvasijicgikabimychiqxqu] password: " -u root /bin/sh -c '"'"'echo BECOME-SUCCESS-lfxcjguxnbvasijicgikabimychiqxqu; LANG=en_US.UTF-8 LC_CTYPE=en_US.UTF-8 /usr/bin/python /home/centos/.ansible/tmp/ansible-tmp-1428172088.68-234873763936405/command; rm -rf /home/centos/.ansible/tmp/ansible-tmp-1428172088.68-234873763936405/ >/dev/null 2>&1'"'"''
EXEC previous known host file not found for 10.10.0.234
<10.10.1.249> PUT /tmp/tmp1swEiT TO /home/centos/.ansible/tmp/ansible-tmp-1428172088.77-77338967532570/command
<10.10.1.248> ESTABLISH CONNECTION FOR USER: centos
<10.10.1.248> REMOTE_MODULE command shutdown -r now "Ansible updates triggered"
<10.10.1.248> EXEC ssh -C -tt -vvv -o ControlMaster=auto -o ControlPersist=60s -o ControlPath="/home/centos/.ansible/cp/ansible-ssh-%h-%p-%r" -o KbdInteractiveAuthentication=no -o PreferredAuthentications=gssapi-with-mic,gssapi-keyex,hostbased,publickey -o PasswordAuthentication=no -o ConnectTimeout=10 10.10.1.248 /bin/sh -c 'mkdir -p $HOME/.ansible/tmp/ansible-tmp-1428172088.86-10771634536214 && chmod a+rx $HOME/.ansible/tmp/ansible-tmp-1428172088.86-10771634536214 && echo $HOME/.ansible/tmp/ansible-tmp-1428172088.86-10771634536214'
EXEC previous known host file not found for 10.10.1.248
<10.10.1.249> EXEC ssh -C -tt -vvv -o ControlMaster=auto -o ControlPersist=60s -o ControlPath="/home/centos/.ansible/cp/ansible-ssh-%h-%p-%r" -o KbdInteractiveAuthentication=no -o PreferredAuthentications=gssapi-with-mic,gssapi-keyex,hostbased,publickey -o PasswordAuthentication=no -o ConnectTimeout=10 10.10.1.249 /bin/sh -c 'sudo -k && sudo -H -S -p "[sudo via ansible, key=txrdvprualxegtspglqctpdduavwvjmv] password: " -u root /bin/sh -c '"'"'echo BECOME-SUCCESS-txrdvprualxegtspglqctpdduavwvjmv; LANG=en_US.UTF-8 LC_CTYPE=en_US.UTF-8 /usr/bin/python /home/centos/.ansible/tmp/ansible-tmp-1428172088.77-77338967532570/command; rm -rf /home/centos/.ansible/tmp/ansible-tmp-1428172088.77-77338967532570/ >/dev/null 2>&1'"'"''
EXEC previous known host file not found for 10.10.1.249
<10.10.1.248> PUT /tmp/tmpmiLyZ6 TO /home/centos/.ansible/tmp/ansible-tmp-1428172088.86-10771634536214/command
<10.10.1.12> ESTABLISH CONNECTION FOR USER: centos
<10.10.1.12> REMOTE_MODULE command shutdown -r now "Ansible updates triggered"
<10.10.1.12> EXEC ssh -C -tt -vvv -o ControlMaster=auto -o ControlPersist=60s -o ControlPath="/home/centos/.ansible/cp/ansible-ssh-%h-%p-%r" -o KbdInteractiveAuthentication=no -o PreferredAuthentications=gssapi-with-mic,gssapi-keyex,hostbased,publickey -o PasswordAuthentication=no -o ConnectTimeout=10 10.10.1.12 /bin/sh -c 'mkdir -p $HOME/.ansible/tmp/ansible-tmp-1428172088.95-256559000632752 && chmod a+rx $HOME/.ansible/tmp/ansible-tmp-1428172088.95-256559000632752 && echo $HOME/.ansible/tmp/ansible-tmp-1428172088.95-256559000632752'
EXEC previous known host file not found for 10.10.1.12
<10.10.1.248> EXEC ssh -C -tt -vvv -o ControlMaster=auto -o ControlPersist=60s -o ControlPath="/home/centos/.ansible/cp/ansible-ssh-%h-%p-%r" -o KbdInteractiveAuthentication=no -o PreferredAuthentications=gssapi-with-mic,gssapi-keyex,hostbased,publickey -o PasswordAuthentication=no -o ConnectTimeout=10 10.10.1.248 /bin/sh -c 'sudo -k && sudo -H -S -p "[sudo via ansible, key=ajywauuhwsziswsyfutujlqmxquugbfm] password: " -u root /bin/sh -c '"'"'echo BECOME-SUCCESS-ajywauuhwsziswsyfutujlqmxquugbfm; LANG=en_US.UTF-8 LC_CTYPE=en_US.UTF-8 /usr/bin/python /home/centos/.ansible/tmp/ansible-tmp-1428172088.86-10771634536214/command; rm -rf /home/centos/.ansible/tmp/ansible-tmp-1428172088.86-10771634536214/ >/dev/null 2>&1'"'"''
EXEC previous known host file not found for 10.10.1.248
<10.10.1.12> PUT /tmp/tmpjCweZe TO /home/centos/.ansible/tmp/ansible-tmp-1428172088.95-256559000632752/command
<10.10.1.12> EXEC ssh -C -tt -vvv -o ControlMaster=auto -o ControlPersist=60s -o ControlPath="/home/centos/.ansible/cp/ansible-ssh-%h-%p-%r" -o KbdInteractiveAuthentication=no -o PreferredAuthentications=gssapi-with-mic,gssapi-keyex,hostbased,publickey -o PasswordAuthentication=no -o ConnectTimeout=10 10.10.1.12 /bin/sh -c 'sudo -k && sudo -H -S -p "[sudo via ansible, key=lhxgthxuwqarmkukfrwjydenlojxxyfh] password: " -u root /bin/sh -c '"'"'echo BECOME-SUCCESS-lhxgthxuwqarmkukfrwjydenlojxxyfh; LANG=en_US.UTF-8 LC_CTYPE=en_US.UTF-8 /usr/bin/python /home/centos/.ansible/tmp/ansible-tmp-1428172088.95-256559000632752/command; rm -rf /home/centos/.ansible/tmp/ansible-tmp-1428172088.95-256559000632752/ >/dev/null 2>&1'"'"''
EXEC previous known host file not found for 10.10.1.12
fatal: [10.10.0.233] => SSH Error: Shared connection to 10.10.0.233 closed.
It is sometimes useful to re-run the command using -vvvv, which prints SSH debug output to help diagnose the issue.
fatal: [10.10.0.234] => SSH Error: Shared connection to 10.10.0.234 closed.
It is sometimes useful to re-run the command using -vvvv, which prints SSH debug output to help diagnose the issue.
fatal: [10.10.1.249] => SSH Error: Shared connection to 10.10.1.249 closed.
It is sometimes useful to re-run the command using -vvvv, which prints SSH debug output to help diagnose the issue.
fatal: [10.10.1.248] => SSH Error: Shared connection to 10.10.1.248 closed.
It is sometimes useful to re-run the command using -vvvv, which prints SSH debug output to help diagnose the issue.
fatal: [10.10.1.12] => SSH Error: Shared connection to 10.10.1.12 closed.
It is sometimes useful to re-run the command using -vvvv, which prints SSH debug output to help diagnose the issue.

FATAL: all hosts have already failed -- aborting

PLAY RECAP ********************************************************************
           to retry, use: --limit @/home/centos/site.retry

10.10.0.233                : ok=20   changed=3    unreachable=1    failed=0
10.10.0.234                : ok=20   changed=3    unreachable=1    failed=0
10.10.1.12                 : ok=17   changed=3    unreachable=1    failed=0
10.10.1.248                : ok=17   changed=3    unreachable=1    failed=0
10.10.1.249                : ok=17   changed=3    unreachable=1    failed=0
127.0.0.1                  : ok=25   changed=7    unreachable=0    failed=0

@darrylc
Copy link
Author

darrylc commented Apr 4, 2015

I should also note that I have this in my ssh config during the play:

Host 10.*.*.*
StrictHostKeyChecking no
UserKnownHostsFile=/dev/null

So, the known hosts message shouldn't be a reason to abort the play.

@jder
Copy link
Contributor

jder commented Apr 6, 2015

I was able to reproduce this with the following playbook:

---

- hosts: all
  sudo: true
  tasks:
    - name: kill connection
      command: killall sshd
      ignore_errors: 1
      async: 1
      poll: 0

Despite the async: 1 (or 0) and the ignore_errors, this produces the same ssh error you're seeing. I'm looking more into it.

@jder
Copy link
Contributor

jder commented Apr 6, 2015

I think what's going on is that when you launch an async process on the remote host (with async: 1), what it does is run a small synchronous process which sleeps for 1 second and then returns a small amount of JSON, as well as actually starting the async job. The problem is that if the SSH connection is torn down before the small synchronous job finishes, Ansible treats this as an unreachable error. I'm not sure what the "right" solution is (and I don't see changes here since 1.8.1), but this workaround works for me:

- name: Reboot
  shell: sleep 2 && shutdown -r now "Ansible updates triggered"
  async: 1
  poll: 0
  ignore_errors: true

@darrylc
Copy link
Author

darrylc commented Apr 6, 2015

hmm, so that seems to advance my scripts, but it doesn't seem to reboot my server.

@jder
Copy link
Contributor

jder commented Apr 6, 2015

Ah, sorry, you'll probably need to ask your wait_for command to delay for a few seconds before starting to poll now, since the reboot is now running asynchronously and delayed by 2 seconds:

- name: Waiting for all app servers
  local_action: wait_for host={{ inventory_hostname }}
                port=22 state=started delay=10
  sudo: false
  when: runUpdates

@darrylc
Copy link
Author

darrylc commented Apr 6, 2015

Server is still not rebooting. The delay is happening, but I suspect that the 'sleep 2 && shutdown -r now "Ansible updates triggered"' command isn't actually working.

@darrylc
Copy link
Author

darrylc commented Apr 7, 2015

it looks like running 'sleep 2 && shutdown -r now' via sudo requires a password. Running 'shutdown -r now' via sudo does not. Running 'sleep 2 && shutdown -r now' as root doesn't require a password either. I realize this might be outside the scope of Ansible, but any ideas?

@jder
Copy link
Contributor

jder commented Apr 7, 2015

That's very strange. Perhaps your sudoers configuration is set up to only allow certain commands to be run without a password?

@darrylc
Copy link
Author

darrylc commented Apr 7, 2015

Nope, sudo for that user (at that time) has full access

@jder
Copy link
Contributor

jder commented Apr 7, 2015

Well, it was a long shot; I don't think Ansible does sudo $COMMAND; it runs a script which then runs your command, so it would be hard to understand why one would work and the other wouldn't. Sorry, I really don't understand how that could require a password. Does just shell: shutdown -r now have the same problem?

@darrylc
Copy link
Author

darrylc commented Apr 7, 2015

sleep by itself, or shutdown by itself, seems to work

@jder
Copy link
Contributor

jder commented Apr 8, 2015

What happens if you just run it via Ansible synchronously?

- name: Reboot
  shell: sleep 2 && shutdown -r now "Ansible updates triggered"

@darrylc
Copy link
Author

darrylc commented Apr 8, 2015

Goes back to the original "fatal: [x.x.x.x] => SSH Error: Shared connection to x.x.x.x closed."

@jder
Copy link
Contributor

jder commented Apr 8, 2015

So, that clearly reboots. But when you add the async: 1 and poll: 0, it no longer reboots?

@darrylc
Copy link
Author

darrylc commented Apr 8, 2015

Hmm, might be working now. I changed it from a command: task to a shell: task. I hadn't noticed the discrepancy between our plays until now. I'll continue to test.

@darrylc
Copy link
Author

darrylc commented Apr 8, 2015

Yes, seems to be working! Thanks so much jder!

Here's the playbook:

- name: Reboot
  shell: sleep 2 && shutdown -r now "Ansible updates triggered"
  async: 1
  poll: 0
  sudo: true
  ignore_errors: true
  when: runUpdates

- name: Waiting for all app servers
  local_action: wait_for host={{ inventory_hostname }}
                port=22 state=started delay=10
  sudo: false
  when: runUpdates

@darrylc darrylc closed this as completed Apr 8, 2015
@jder
Copy link
Contributor

jder commented Apr 8, 2015

Great! Glad to hear it.

On Apr 7, 2015, at 10:17 PM, Darryl Chin notifications@github.com wrote:

Yes, seems to be working! Thanks so much jder!

Here's the playbook:

  • name: Reboot
    shell: sleep 2 && shutdown -r now "Ansible updates triggered"
    async: 1
    poll: 0
    sudo: true
    ignore_errors: true
    when: runUpdates
  • name: Waiting for all app servers
    local_action: wait_for host={{ inventory_hostname }}
    port=22 state=started delay=10
    sudo: false
    when: runUpdates

    Reply to this email directly or view it on GitHub.

@ghost
Copy link

ghost commented Apr 12, 2015

Hi ...
i have same error ...
i am using roles and i have included handler files i get same error i try local handler/main.yml
same error ...

can understand why it is not Waiting....
some time it is working some not .... i cant understand why ? ? ? ?

@jrm16020
Copy link

I know that this ticket is closed, but I thought that I'd add my solution for any Googlers out there. Instead of:

shell: sleep 2 && shutdown -r now "Ansible updates triggered"

I used:

shell: /sbin/shutdown -r -t 3

@lynyrds
Copy link

lynyrds commented May 11, 2015

@darrylc
As to your question about why "it looks like running 'sleep 2 && shutdown -r now' via sudo requires a password."
You can run using sudo either sleep 2 or shutdown -r now, but not both combined -- because && is interpreted by the shell.
There's no such command 'sleep 2 && shutdown -r now' -- that's why sudo is asking for a password (normal behaviour if you try to run sudo <unknown_command>).

Long story short, I'd rather try (you don't really need sudo to sleep for two seconds):
sleep 2 && sudo shutdown -r now

@BlueShells
Copy link

same problem in ansible-1.9.1,don't know why there should be an sleep 2

@bcoca
Copy link
Member

bcoca commented May 21, 2015

shutdown -r +2? the wait will be built in

On Thu, May 21, 2015 at 12:58 AM, ZhiMing Zhang notifications@github.com
wrote:

same problem in ansible-1.9.1,don't know why there should be an sleep 2


Reply to this email directly or view it on GitHub
#10616 (comment).

Brian Coca

@jder
Copy link
Contributor

jder commented May 21, 2015

For posterity, the reason I wasn't using shutdown -r +2 is that my version of shutdown treats that as "2 minutes".

@ghost
Copy link

ghost commented May 21, 2015

no ... this is the command by default for linux dist :) its in minus use sleep 2 && shutdown -r

@bcoca
Copy link
Member

bcoca commented May 21, 2015

ah, did not realize the +# was in minutes ... too used to the current fast
paced world!

On Thu, May 21, 2015 at 12:16 PM, noamgr notifications@github.com wrote:

no ... this is the command :) its in minus use sleep 2 && shutdown -r


Reply to this email directly or view it on GitHub
#10616 (comment).

Brian Coca

@inevity
Copy link

inevity commented Jun 12, 2015

same error
ansible -vvvv 192.168.1.254 -m setup
<192.168.1.254> ESTABLISH CONNECTION FOR USER: root
<192.168.1.254> REMOTE_MODULE setup
<192.168.1.254> EXEC ssh -C -tt -vvv -o ControlMaster=auto -o ControlPersist=60s -o ControlPath="/root/.ansible/cp/ansible-ssh-%h-%p-%r" -o KbdInteractiveAuthentication=no -o PreferredAuthentications=gssapi-with-mic,gssapi-keyex,hostbased,publickey -o PasswordAuthentication=no -o ConnectTimeout=10 192.168.1.254 /bin/sh -c 'mkdir -p $HOME/.ansible/tmp/ansible-tmp-1434089342.6-168328121475741 && echo $HOME/.ansible/tmp/ansible-tmp-1434089342.6-168328121475741'
192.168.1.254 | FAILED => SSH Error: Shared connection to 192.168.1.254 closed.
It is sometimes useful to re-run the command using -vvvv, which prints SSH debug output to help diagnose the issue.

@OBJ-feye
Copy link

reboot system ,error lost ; by version 1.91

@anirudh-wa
Copy link

Hi,

I am running into the same issue and I have no success in a few days. Any help is appreciated.

"fatal: [54.184.91.116]: FAILED! => {
"changed": false,
"failed": true,
"msg": "BECOME-SUCCESS-dbwbjmofefcssrvsteepiobrzztqvjsc\r\nTraceback (most recent call last):\r\n File "/root/.ansible/tmp/ansible-tmp-1437677072.45-174432208914762/command", line 1871, in \r\n main()\r\n File "/root/.ansible/tmp/ansible-tmp-1437677072.45-174432208914762/command", line 91, in main\r\n module = CommandModule(argument_spec=dict())\r\n File "/root/.ansible/tmp/ansible-tmp-1437677072.45-174432208914762/command", line 535, in init\r\n self._check_for_check_mode()\r\n File "/root/.ansible/tmp/ansible-tmp-1437677072.45-174432208914762/command", line 1071, in _check_for_check_mode\r\n for (k,v) in self.params.iteritems():\r\nAttributeError: 'tuple' object has no attribute 'iteritems'\r\nOpenSSH_6.6.1, OpenSSL 1.0.1f 6 Jan 2014\r\ndebug1: Reading configuration data /home/ubuntu/.ssh/config\r\ndebug1: Reading configuration data /etc/ssh/ssh_config\r\ndebug1: /etc/ssh/ssh_config line 19: Applying options for *\r\ndebug1: auto-mux: Trying existing master\r\ndebug2: fd 3 setting O_NONBLOCK\r\ndebug2: mux_client_hello_exchange: master version 4\r\ndebug3: mux_client_forwards: request forwardings: 0 local, 0 remote\r\ndebug3: mux_client_request_session: entering\r\ndebug3: mux_client_request_alive: entering\r\ndebug3: mux_client_request_alive: done pid = 30268\r\ndebug3: mux_client_request_session: session request sent\r\ndebug1: mux_client_request_session: master session id: 2\r\ndebug3: mux_client_read_packet: read header failed: Broken pipe\r\ndebug2: Received exit status from master 0\r\nShared connection to 54.184.91.116 closed.\r\n",
"parsed": false
}"

@jder
Copy link
Contributor

jder commented Jul 23, 2015

Have you tried my answer above?

- name: Reboot
  shell: sleep 2 && shutdown -r now "Ansible updates triggered"
  async: 1
  poll: 0
  ignore_errors: true

@lynyrds
Copy link

lynyrds commented Jul 28, 2015

It won't work since 1.9.1 I guess.

But this one will still work:

- name: reboot server
  shell: /bin/echo "/sbin/reboot" | /usr/bin/at now + 1 min

@ghost
Copy link

ghost commented Aug 2, 2015

this is known issue i think it will fix in 2.0 ... i am using
ignore_errors: true and its work :)

@galindro
Copy link

how could I reboot the server using this technique with ad-hoc command line? I've tried the bellow command but no sucess even if It shows success message. The VM isn't rebooted...

$ ansible tag_v3_api_update_True -B 1 -P 0 -b -v -i ec2.py -a 'sleep 2 && shutdown -r now "Ansible updates triggered"'

background launch...


54.94.167.25 | success >> {
    "ansible_job_id": "601411637602.2026",
    "results_file": "/root/.ansible_async/601411637602.2026",
    "started": 1
}

@galindro
Copy link

My mistake... this is the correct way:

ansible tag_v3_api_update_True -B 1 -P 0 -b -v -i ec2.py -m shell -a 'sleep 2 && shutdown -r now "Ansible updates triggered"'

Now it works

@amitkumarjha
Copy link

Delete the files from the ansibile server from where you are running the commands,

cd $HOME/.ansible/cp/

rm -rf

A1ve5 pushed a commit to fgci-org/fgci-ansible that referenced this issue Feb 8, 2016
 - Applying: ansible/ansible#10616
 - Not tested
@mtpereira
Copy link
Contributor

Hello,

It clearly depends on the operating system. Moving from Debian 7 (Wheezy) to 8 (Jessie) on a Vagrant box shown this issue. I've upgraded from Ansible 1.9.4 to 2.0.0.2 and I get the exact same behaviour on both versions. If I had to guess, I'd say the order the services are stopped on has been changed and it's causing this.

@gvenka008c
Copy link

Anyway we can gather the facts again after the machine reboots and we wait for it to come back? Once the machine reboots, i need to gather the facts again before moving on with the next plays. Any thoughts?

@systeminsightsbuild
Copy link

I am now seeing this with 1.9.4 but not with shell. This does not always fail, but often enough.

- name: Create the agent import file
  copy:
    content: "{{inventory_hostname}},{{ansible_hostname}}"
    dest: "{{ossec_dir}}/tmp/agent-{{ansible_hostname}}"
    group: "{{ossec_group}}"
  delegate_to: "{{ossec_server}}"
  when: _register_agent
``

@dflock
Copy link
Contributor

dflock commented May 28, 2016

I had a task that worked on ansible < 2.1.0.0 start to fail in this way with 2.1.0.0. The solution from @jder above worked for me.

Old task:

- name: 'Restarting host machine(s) (Shows errors - OK to ignore!)'
  command: shutdown -r now 
  async: 0
  poll: 0
  ignore_errors: true
  become: yes

new version that works with 2.1.0.0:

- name: 'Restarting host machine(s) (Shows errors - OK to ignore!)'
  shell: sleep 2 && shutdown -r now 
  async: 1
  poll: 0
  ignore_errors: true
  become: yes

@cameronkerrnz
Copy link

cameronkerrnz commented Jun 26, 2016

I think the most graceful way to wait for a reboot to complete would be to take a note of the boot time prior to issuing a reboot, and wait for the boot-time to change. Some systems take a long time to complete shutdown, and assuming that shutdown takes less than a couple of minutes before SSH becomes unreachable is asking for problems,

Not sure how to code that in Ansible yet. I wonder how portable the command 'uptime --since' is in the Linux world... doesn't appear to be present in RHEL5, RHEL6, and makes an appearance in RHEL at RHEL7.

At least the following should be highly portable on Linux systems:

expr $(date +%s) - $(cut -d. -f1 /proc/uptime)

or if you want something more human readable

date --date=@$(expr $(date +%s) - $(cut -d. -f1 /proc/uptime))

or something with ISO8601 date format, as an ad-hoc shell command

ansible cf_canary -m shell -a 'date +%y-%m-%dT%H:%M:%S --date=@$(expr $(date +%s) - $(cut -d. -f1 /proc/uptime))'

Output is like: 16-06-24T14:58:05

@dflock
Copy link
Contributor

dflock commented Jun 26, 2016

It seems like there's enough complexity here for a reboot module, that would encapsulate all the platform specific differences in rebooting a node.

There's a windows one, contributed by @nitzmahone here: #15314, shipping with 2.1, docs here: https://github.com/ansible/ansible-modules-core/pull/3376/files - and an issue request for a cross platform module here: #16186

@nitzmahone
Copy link
Member

@cameronkerrnz I went for a combination in win_reboot- I actually added the ansible_lastboot fact to Windows for that exact purpose, but ended up not shipping the version of win_reboot that used it for various reasons (instead waiting for the port to go down, then back up, then for a "canary" command to succeed over WinRM). I might still revisit that at some point. At least on WinRM, it's expensive to wait for the connect timeouts and stuff if the port's not open and responding- ssh might be a little cheaper there, but hard to tell without just trying (plus some systems refuse connections until SSH is ready, others will accept and tarpit/block).

I think the only real question outstanding is how generic we want this to be? E.g., do we want to have a python dep on the client or just make it purely ssh/shell/command-based, so it can potentially be used on things like switches/routers/embedded devices? I'd kinda lean toward the latter (where you could override the command that gets sent for your platform of choice), but that might somewhat limit some of the other behaviors (eg, calculating "yes, we actually rebooted" by sampling an uptime cmd, etc).

@cameronkerrnz
Copy link

@nitzmahone As a useful point of self-imposed abstraction to make later room for switches/etc., might I suggest the module be something like posix_reboot? Presumably, all such platforms would have to have Python 2.4 anyway, no?

One thing to keep in mind, which helps to limit the scope of what this module could reasonably achieve, would be that noting/comparing the boot-time is in itself only useful for answering question "have we rebooted yet [or is something still taking ages to shut everything down]".

I'm not convinced there is a good way of answering the question "are we ready to resume our play yet"... at least not without some knowledge of the platform deployment; so that should reasonably be left up to the play writer...

eg. In RHEL6 or any other SysV init system, even if you can SSH into machine while it is booting, it might still be busy running fsck

...but I suppose if you have a file: path=/stillbooting state=present as the last action before rebooting, and putting rm /stillbooting to the end of rc.local would be a useful measure, and then you could use a wait_for: path=/stillbooting state=absent. Perhaps greater hope is available in RHEL7 and the SystemD world.

But certainly you can expect to have to have something that implements multiple strategies, like the hostname module. That would be a given simply to determine the boot time.

@giannisalinetti
Copy link

giannisalinetti commented Dec 21, 2016

This workaround solved the issue on my Fedora 24 with ansible-2.2.0.0 managing a CentOS 7.3 virtual machine after a yum update. Using the shell module is fundamental because of the && shell builtin.

- name: Reboot the server
  shell: sleep 2 && shutdown -r now 'Maintenance reboot'
  async: 1
  poll: 0
  ignore_errors: true

landam added a commit to gislab-npo/gislab that referenced this issue Jan 27, 2017
@Dionysusio
Copy link

it has solved my problem! thank you!

@Cougar
Copy link
Contributor

Cougar commented Jul 24, 2018

With "raw" module (when python is not installed or usable) this sleep hack doesn't work and async is also not usable. I tried a lot of different command lines and only one that finally worked was:

- name: restart node
  raw: "echo -e '#/bin/sh\n(sleep 2; sudo shutdown --reboot now) &\n' > /tmp/reboot.sh && chmod +x /tmp/reboot.sh && nohup /tmp/reboot.sh"

@realtebo
Copy link

What is the source of this problem?
Yesterday I run my script without any problem and this morning for the first time I've the same issue.
Script is not changed, and the VM has the same base
Is changed my hosts file, but I've no idea how / what I've broken

@ansible ansible locked and limited conversation to collaborators Apr 25, 2019
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests