-
Notifications
You must be signed in to change notification settings - Fork 3.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Unable to establish a SSH connection following reboot due to change of IP #8528
Comments
#5642 looks to be reporting the same problem. However, in that report the issue occurs building CoreOS with Hyper-V. As stated above, I do not think this issue will be specific to any one builder or OS. The core issue appears to be that Packer assumes an instance will always receive the same IP address/lease across reboots. Currently, Packer first enters a loop to determine the IP address it should try and connect to; Clearly, the logic used is dependant on the builder. Once Packer has determined the IP it enters into another loop that attempts to establish a connection. If the IP address changes while Packer is in this second loop (as is possible when the machine reboots) then the connection attempt will eventually time out and fail. To fix this the two loops need to be merged e.g. Packer should continually try to determine the IP address that has been assigned to the instance and then attempt to connect in the same loop. The way the code is structured at present makes this difficult to fix as the connection logic/loop has been broken out in to a generic helper that all builders use. |
FWIW, I'm experiencing the same issue with packer 1.5.1 and Photon OS 3 Rev 2. It seems the VM grabs a lease in order to boot with kickstart, but that IP changes by the time the guest is rebooted and ready for Packer to take over. Except Packer is waiting for the IP that it must have seen first -- the one used to access the kickstart config over HTTP. |
I just got Photon OS 3 working by adding the {
"hostname": "haproxy-lb",
"password": {
"crypted": false,
"text": "photon"
},
"disk": "/dev/sda",
"install_linux_esx": true,
"packages": [
"minimal",
"linux",
"initramfs"
],
"additional_packages": [
"ca-certificates",
"curl",
"gzip",
"haproxy",
"jq",
"lsof",
"lvm2",
"ntp",
"openssh-server",
"open-vm-tools",
"sed",
"shadow",
"sudo",
"tar",
"vim"
],
"postinstall": [
"#!/bin/sh",
"useradd -U --groups wheel photon && echo 'photon:photon' | chpasswd",
"useradd --system --home-dir=/var/lib/haproxy --user-group haproxy",
"mkdir -p /home/photon",
"chown -R photon:photon /home/photon",
"mkdir -p /var/lib/haproxy",
"chown -R haproxy:haproxy /var/lib/haproxy",
"systemctl enable sshd",
"systemctl disable haproxy",
"echo 'photon ALL=(ALL) NOPASSWD: ALL' >/etc/sudoers.d/photon",
"chmod 440 /etc/sudoers.d/photon",
"tdnf clean all"
]
} |
EDIT: Actually, I'm still encountering this issue - even after adding open-vm-tools to the preseed. @akutz Thanks for that! Adding open-vm-tools to the Debian preseed file (Debian's kickstart equivalent) worked for me too.
I haven't looked too deeply, but clearly, this enables some interaction/magic to occur that ensures the instance keeps the same IP address across reboots - perhaps something in /etc/vmware-tools/scripts/vmware/network?? While this a viable workaround for the given OS/platform combinations documented here, I expect this is a bug that will continue to resurface for Packer users due to the way the IP discovery and connection logic is currently structured. Packer needs to be able to handle the situation where the IP address changes across a reboot/dhcp address renewal. Workarounds of the kind documented here may not always be available. |
@akutz Unfortunately, I've now found that only worked for me once! Has adding open-vm-tools to your kickstart solved the problem for you consistently? |
This will probably not be an easy change to make based on Packer's architecture and the way our communicators currently work, but I agree that ideally Packer would support situations where the IP address changes. |
I wonder, why this is an issue with the architecture as on the vmware-iso builder, when using an esxi, it works as well. And there I am using dnsmasq as my dhcp. So there is no possibility to look in a file. Instead packer has to look using the open-vm-tools or the esxi api (I guess). I tried to build debian 10 with open-vm-tools in the preseed, but it still doesn't work. |
I found a temporary workarount or rather a hack. The config for the networking file in /etc/vmware/networking is as follows:
The config file in /et/vmware/vmnet10/dhcpd/dhcpd.conf is as follows:
The config file in /etc/vmware/vmnet10/nat/nat.conf is as follows:
|
I also found that on Photon it was failing after a while after the first attempt after a reboot. Finally I got it working by killing this in between attempts: $ sudo ps alx | grep vagrant
0 2316 1 0 20 0 558440636 384 - Ss ?? 0:00.11 /opt/vagrant-vmware-desktop/bin/vagrant-vmware-utility api -port=9922 Keep in mind, I'm not running Vagrant. But I bet Packer is utilizing something from Vagrant. |
Based on this changelog, https://github.com/hashicorp/vagrant-plugin-changelog/blob/master/vagrant-vmware-utility-changelog.md, it does appear the |
Perhaps related to hashicorp/vagrant#9915? |
It seems, using the WinRM Communicator, packer is constantly querying for a new ip. VMWare workstation 15 is using a ISC DHCP version 2, because of that, the option to ignore the client id is not implemented. It is a default option from many standard dhcp servers out there to prefer the client id over the hardware id, when there is one. Even the very new kea dhcp from isc is using that as a default option. That's why I came up with this "workaround". It seems, the option to ignore the client identifier in combination with the script is a proper workaround. I created a gist for people having the same problem: |
Hello. We had the same issue and personally for us such an approach worked: build a virtual machine for VirtualBox, as it later can be imported to VMware with ease. I did not find the restrictions of this approach for now, but it works with Debian 10 at least. |
For me adding open-vm-tools to the preseed file did not work. I found a workaround to add to the preseed file:
That sets the dhcp-client-identifier option so that the MAC address is used. This was the default on versions prior to Debian 10 (see https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=906894). |
@jan-z Nice workaround! This works for me too. Can I ask where you found the documentation for setting the client identifier to The calls made by The next version of |
@DanHam I can't find any documentation for that setting. I got it from here: https://www.reddit.com/r/debian/comments/ca5vjb/dhcp_identifiers_changed_on_upgrade_to_buster/ |
@jan-z Ah OK. I had seen that... I've taken a more in-depth look. It's a bit difficult to find but this is documented in the @SwampDragons @azr The workaround put forward by @jan-z is the right way to fix this - at least until THIS FIX works it's way in to the next Debian ifupdown package. Do you want to close this or leave it open for a while for others coming across the same issue? |
See: hashicorp/packer#8528 The ifupdown package in Debian 10 (ifupdown 0.8.35) is now configured to call dhclient with a flag that sends the DUID as the host identifier when obtaining an IP address from the DHCP server. This differs from the installer which uses the MAC address as the identifier. As a consequence, the DHCP server issues different IP addresses to the installer and the newly built host. This causes issues for Packer as the IP address used to contact the host is only enumerated when the installer is running at the start of the build. Since the newly booted host has a different IP address, Packer fails to establish a connection to the host. This commit fixes the issue by configuring the DHCP client in the newly built host to use the MAC address - the same as is done by the installer. Once Packer has established a connection, the change is reverted to the default configuration shipped with the isc-dhcp-client package. Future versions of ifupdown will allow the user to configure if dhclient is called with the flag to send the DUID to the DHCP server.
🤔 hm, IMHO, a doc page would be nice, like a short title/description that matches the issue so it's easy to Google, stating all possible options. So users can understand the issue and pick the path they want: update/fix/else. Edit: Super good findings ! |
I'm experiencing the same issue with Workstation 15.5 (ubuntu), packer 1.5.4 and Photon OS 3 Rev 2. The workaround with open-vm-tools package is'nt working. Do you know how i can work around it without change of vmware configurations ? |
@jan-z workaround works for me too with packer version 1.5.5 |
Buster has two preseed changes: * We now need to set partman to use the full size of the new partition, * We need to change the new default DHCP lease behaviour to use the MAC address, so that during the build process we keep the same IP address Here, we move the existing preseed to `preseed-legacy.cfg` to keep it the same and introduce a new one. A given preseed file can only contain a single `late_command`, so this replaces the existing one …this isn't needed anyway. hashicorp/packer#8528
Same for me on Archlinux builds. Worked like a charm before but not anymore in 1.5. To be more precise the KVM build still works, just vmware does not. After the reboot the vm has a different IP and packer can't connect |
Same issue here with Packer 1.5.6, Photon OS 3 Rev2, and Fusion 11.5.3. |
@pierreilki - I tried that as well :( It still wanted to pick up a new DHCP record causing packer to get the wrong ip. I also tried setting the hostname of the system to match that of the installer. In my case, this was |
…/packer#8528 and it works just fine now with vmware packer provider
…generated and used during the entire build process. This changes the boot-command used for building the box so that we can seed systemd with the generated machine-id. As CoreOS does not have a forgiving timeout, we have to literally mash keys and then hit backspace in order to catch the boot loader in order to type in our id at the commandline. This machine-id is also fed into the ignition file in order to seed the box with it on first boot. The intention of these changes are so that systemd's dhcp client will retain the same address during the entire build process. All of this is to work around issue hashicorp/packer#8528 and is pretty much because Packer is unable to re-determine the address of the guest if it changes whilst in the middle of building. It turns out that the latest version of CoreOS resets the machine-id after the install which results in this specific issue.
Hey y'all. So I wrote some unit tests for the majority of the vmware builder parsers (#9303), and did some refactoring of the dhcpd lease parsers (#9319). The reason being is because it looks like this issue can be "kind of" solved in This I had to rewrite the dhcpd lease parsers so that it would first-of-all be easier to test, but so that it would not only parse the dhcpd leases... but be wayyy easier to extract more than one match, and on any particular field ( There may be some issues with doing this that I don't see yet, but I broke up my intentions into separate stages so that it can be easier to review their individual modifications. PR #9319 should be completely backwards compatible with the way the dhcpd lease parser is currently working, and the next PR (which I'm going to start working on in a minute) will end up working in a non-backwards-compatible way due to changing the way that |
Okay...I'm actually surprised it works, but PR #9322 reworks Now it'll take like a second or two for it to recognize that ssh is up, but packer seems to recognize the new address and continue to the next multistep like it's supposed to. So.. PR #9322 should fix this..properly., and without hacking up your guest or VMware configuration. |
I wanted to take a moment to thank you for all your hard work on this @arizvisa! |
Thx. Anything for another austinite. ;) |
Unit tests for the driver_parser.go functionality of the vmware builder
I think this was accidentally closed due to a “close” keyword in #9303, while that PR doesn’t actually fix this issue. |
Good catch, thanks. |
This refactors the dhcpd lease parser in the vmware builders and adds unit tests for everything.
Fix the VMware builders when the guest platform's dhcpcd switches the ip address in-between a build
I think this was closed "for real" by PR 9322. We'll be releasing v1.6.0 early next week. |
Fixed confirmed using the configuration files linked above by @DanHam. Note I fixed the deprecation issue locally for the
|
Since a list of hosts is being checked linearly depending on how many leases match (as opposed to before)... Is there a noticeable difference for y'all in the time it takes to detect the new address from the previous method? It likely doesn't really matter, but is it as significant for you guys as it is for me? Also, is the vmware builder the only one which uses the method of parsing the dhcp leases in order to determine the address of the guest? |
@arizvisa Just like to second the thanks above for the fix! Really appreciated! I have to say, I didn't really look time my build/watch it too much, but I didn't notice any significant delays. Things worked pretty much as they did before for me. |
@arizvisa A quick update on my comment above. I've now built a few boxes with the latest Packer build. I've had mixed results with respect to the time it takes for the new address to be picked up by Packer. Sometimes the address is picked up quickly. Other times there is a very noticeable delay - in the region of minutes - while the box is sitting there post reboot waiting for Packer to connect. |
Hmm.. I wonder if performance can be improved slightly by sorting the list of leases that we parse in descending order (at the end of the Anyways, just some some potential solutions to consider. I'll leave this experimentation up to the maintainers or another contributor for the moment perhaps until I can find more time. |
Makes sense, thanks @arizvisa for getting it this far :) |
course. i got u. |
I'm going to lock this issue because it has been closed for 30 days ⏳. This helps our maintainers find and focus on the active issues. If you have found a problem that seems similar to this, please open a new issue and complete the issue template so we can capture all the details necessary to investigate further. |
Overview of the Issue
Packer is unable to establish a SSH connection after the installer has completed and the instance has been rebooted. This occurs because the IP address assigned to the instance by the DHCP server changes following the reboot. Packer determines the IP address it should connect to at the start of the build and never picks up on the fact the instance has been assigned a new address.
The issue only seems to manifest itself when building Debian 10 and using the VMware ISO builder - perhaps because of the newly introduced 'default-uid' used by the DHCP client (see below)? However, I'm fairly confident that DHCPD servers do not have to provide an instance with the same IP address as leased previously. The current logic built in to Packer does not allow for this possibility.
I am able to connect to the instance manually. The same template works fine with Virtualbox 6.0.14. Note that Debian 9 works with both VMware and Virtualbox.
Reproduction Steps
Run the template referenced below.
The Debian ISO used in the build can be downloaded HERE
Packer version
Packer v1.5.1-dev (66445ec)
Simplified Packer Buildfile
See the
debian-10.json
template HEREOperating system and Environment details
VMware Fusion Professional Version 8.5.10 (7527438)
Log Fragments and crash.log files
While the installer is running the contents of the VMware DHCPD leases file (
/var/db/vmware/vmnet-dhcpd-vmnet8.leases
) is as follows:Once the installer has completed and the instance has rebooted the DHCPD leases file has a new entry with the same MAC address. Note the addition of the
uid
field in the second entry. Clearly, the VMware DHCPD implementation has decided that this is a 'new' machine that needs a new address:Meanwhile, Packer continues attempting to connect to the address parsed from the leases file when the initial IP address was requested by the installer (172.16.83.128). The following snippet from the logs shows the change in message before and after the reboot of the instance.
The first message is displayed when the installer is running - the correct address has been recorded but the SSH daemon is not running so we get a connection refused error.
The second message is displayed post reboot when the instance has obtained a different IP address - this time 172.16.83.129. However, Packer continues to attempt to connect on the first address as it assumes the address assigned to an instance will remain the same across reboots - in other words it only parses the leases file once at the beginning of the run. Clearly, Packer will never be able to connect as the IP address is now different.
If I manually SSH into the instance I am able to view the instances dhclient leases file:
The text was updated successfully, but these errors were encountered: