Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unable to establish a SSH connection following reboot due to change of IP #8528

Closed
DanHam opened this issue Dec 20, 2019 · 42 comments · Fixed by #9303
Closed

Unable to establish a SSH connection following reboot due to change of IP #8528

DanHam opened this issue Dec 20, 2019 · 42 comments · Fixed by #9303

Comments

@DanHam
Copy link
Contributor

DanHam commented Dec 20, 2019

Overview of the Issue

Packer is unable to establish a SSH connection after the installer has completed and the instance has been rebooted. This occurs because the IP address assigned to the instance by the DHCP server changes following the reboot. Packer determines the IP address it should connect to at the start of the build and never picks up on the fact the instance has been assigned a new address.

The issue only seems to manifest itself when building Debian 10 and using the VMware ISO builder - perhaps because of the newly introduced 'default-uid' used by the DHCP client (see below)? However, I'm fairly confident that DHCPD servers do not have to provide an instance with the same IP address as leased previously. The current logic built in to Packer does not allow for this possibility.

I am able to connect to the instance manually. The same template works fine with Virtualbox 6.0.14. Note that Debian 9 works with both VMware and Virtualbox.

Reproduction Steps

Run the template referenced below.

The Debian ISO used in the build can be downloaded HERE

Packer version

Packer v1.5.1-dev (66445ec)

Simplified Packer Buildfile

See the debian-10.json template HERE

Operating system and Environment details

$ sw_vers
ProductName:    Mac OS X
ProductVersion: 10.14.6
BuildVersion:   18G2022

VMware Fusion Professional Version 8.5.10 (7527438)

Log Fragments and crash.log files

While the installer is running the contents of the VMware DHCPD leases file (/var/db/vmware/vmnet-dhcpd-vmnet8.leases) is as follows:

# All times in this file are in UTC (GMT), not your local timezone.   This is
# not a bug, so please don't ask about it.   There is no portable way to
# store leases in the local timezone, so please don't request this as a
# feature.   If this is inconvenient or confusing to you, we sincerely
# apologize.   Seriously, though - don't ask.
# The format of this file is documented in the dhcpd.leases(5) manual page.

lease 172.16.83.128 {
        starts 5 2019/12/20 18:18:22;
        ends 5 2019/12/20 18:48:22;
        hardware ethernet 00:0c:29:b6:82:7a;
        uid 01:00:0c:29:b6:82:7a;
}

Once the installer has completed and the instance has rebooted the DHCPD leases file has a new entry with the same MAC address. Note the addition of the uid field in the second entry. Clearly, the VMware DHCPD implementation has decided that this is a 'new' machine that needs a new address:

# All times in this file are in UTC (GMT), not your local timezone.   This is
# not a bug, so please don't ask about it.   There is no portable way to
# store leases in the local timezone, so please don't request this as a
# feature.   If this is inconvenient or confusing to you, we sincerely
# apologize.   Seriously, though - don't ask.
# The format of this file is documented in the dhcpd.leases(5) manual page.

lease 172.16.83.128 {
        starts 5 2019/12/20 18:18:22;
        ends 5 2019/12/20 18:48:22;
        hardware ethernet 00:0c:29:b6:82:7a;
        uid 01:00:0c:29:b6:82:7a;
}
lease 172.16.83.129 {
        starts 5 2019/12/20 18:22:19;
        ends 5 2019/12/20 18:52:19;
        hardware ethernet 00:0c:29:b6:82:7a;
        uid ff:29:b6:82:7a:00:01:00:01:25:8f:cd:da:00:0c:29:b6:82:7a;
        client-hostname "localhost";
}

Meanwhile, Packer continues attempting to connect to the address parsed from the leases file when the initial IP address was requested by the installer (172.16.83.128). The following snippet from the logs shows the change in message before and after the reboot of the instance.

2019/12/20 18:22:04 packer-builder-vmware-iso plugin: [DEBUG] TCP connection to SSH ip/port failed: dial tcp 172.16.83.128:22: connect: connection refused
2019/12/20 18:22:24 packer-builder-vmware-iso plugin: [DEBUG] TCP connection to SSH ip/port failed: dial tcp 172.16.83.128:22: i/o timeout

The first message is displayed when the installer is running - the correct address has been recorded but the SSH daemon is not running so we get a connection refused error.

The second message is displayed post reboot when the instance has obtained a different IP address - this time 172.16.83.129. However, Packer continues to attempt to connect on the first address as it assumes the address assigned to an instance will remain the same across reboots - in other words it only parses the leases file once at the beginning of the run. Clearly, Packer will never be able to connect as the IP address is now different.

If I manually SSH into the instance I am able to view the instances dhclient leases file:

$ cat /var/lib/dhcp/dhclient.eth0.leases
default-duid "\000\001\000\001%\217\315\332\000\014)\266\202z";
lease {
  interface "eth0";
  fixed-address 172.16.83.129;
  option subnet-mask 255.255.255.0;
  option routers 172.16.83.2;
  option dhcp-lease-time 1800;
  option dhcp-message-type 5;
  option domain-name-servers 172.16.83.2;
  option dhcp-server-identifier 172.16.83.254;
  option broadcast-address 172.16.83.255;
  option netbios-name-servers 172.16.83.2;
  option domain-name "localdomain";
  renew 5 2019/12/20 18:35:49;
  rebind 5 2019/12/20 18:48:35;
  expire 5 2019/12/20 18:52:20;
}
@DanHam
Copy link
Contributor Author

DanHam commented Dec 22, 2019

#5642 looks to be reporting the same problem. However, in that report the issue occurs building CoreOS with Hyper-V. As stated above, I do not think this issue will be specific to any one builder or OS.

The core issue appears to be that Packer assumes an instance will always receive the same IP address/lease across reboots.

Currently, Packer first enters a loop to determine the IP address it should try and connect to; Clearly, the logic used is dependant on the builder. Once Packer has determined the IP it enters into another loop that attempts to establish a connection. If the IP address changes while Packer is in this second loop (as is possible when the machine reboots) then the connection attempt will eventually time out and fail.

To fix this the two loops need to be merged e.g. Packer should continually try to determine the IP address that has been assigned to the instance and then attempt to connect in the same loop.

The way the code is structured at present makes this difficult to fix as the connection logic/loop has been broken out in to a generic helper that all builders use.

@akutz
Copy link

akutz commented Dec 22, 2019

FWIW, I'm experiencing the same issue with packer 1.5.1 and Photon OS 3 Rev 2. It seems the VM grabs a lease in order to boot with kickstart, but that IP changes by the time the guest is rebooted and ready for Packer to take over. Except Packer is waiting for the IP that it must have seen first -- the one used to access the kickstart config over HTTP.

@akutz
Copy link

akutz commented Dec 22, 2019

I just got Photon OS 3 working by adding the open-vm-tools package to the list of packages in the kickstart file:

{
  "hostname": "haproxy-lb",
  "password": {
    "crypted": false,
    "text": "photon"
  },
  "disk": "/dev/sda",
  "install_linux_esx": true,
  "packages": [
    "minimal",
    "linux",
    "initramfs"
  ],
  "additional_packages": [
    "ca-certificates",
    "curl",
    "gzip",
    "haproxy",
    "jq",
    "lsof",
    "lvm2",
    "ntp",
    "openssh-server",
    "open-vm-tools",
    "sed",
    "shadow",
    "sudo",
    "tar",
    "vim"
  ],
  "postinstall": [
    "#!/bin/sh",
    "useradd -U --groups wheel photon && echo 'photon:photon' | chpasswd",
    "useradd --system --home-dir=/var/lib/haproxy --user-group haproxy",
    "mkdir -p /home/photon",
    "chown -R photon:photon /home/photon",
    "mkdir -p /var/lib/haproxy",
    "chown -R haproxy:haproxy /var/lib/haproxy",
    "systemctl enable sshd",
    "systemctl disable haproxy",
    "echo 'photon ALL=(ALL) NOPASSWD: ALL' >/etc/sudoers.d/photon",
    "chmod 440 /etc/sudoers.d/photon",
    "tdnf clean all"
  ]
}

@DanHam
Copy link
Contributor Author

DanHam commented Dec 26, 2019

EDIT: Actually, I'm still encountering this issue - even after adding open-vm-tools to the preseed.

@akutz Thanks for that! Adding open-vm-tools to the Debian preseed file (Debian's kickstart equivalent) worked for me too.

d-i pkgsel/include string open-vm-tools [...space separated list of additional packages to install into the target system]

I haven't looked too deeply, but clearly, this enables some interaction/magic to occur that ensures the instance keeps the same IP address across reboots - perhaps something in /etc/vmware-tools/scripts/vmware/network??

While this a viable workaround for the given OS/platform combinations documented here, I expect this is a bug that will continue to resurface for Packer users due to the way the IP discovery and connection logic is currently structured.

Packer needs to be able to handle the situation where the IP address changes across a reboot/dhcp address renewal. Workarounds of the kind documented here may not always be available.

@DanHam
Copy link
Contributor Author

DanHam commented Dec 30, 2019

@akutz Unfortunately, I've now found that only worked for me once! Has adding open-vm-tools to your kickstart solved the problem for you consistently?

@akutz
Copy link

akutz commented Jan 2, 2020

@akutz Unfortunately, I've now found that only worked for me once! Has adding open-vm-tools to your kickstart solved the problem for you consistently?

Hi @DanHam,

I've built the image now several times sans any issues.

@SwampDragons
Copy link
Contributor

This will probably not be an easy change to make based on Packer's architecture and the way our communicators currently work, but I agree that ideally Packer would support situations where the IP address changes.

@llxp
Copy link

llxp commented Jan 10, 2020

I wonder, why this is an issue with the architecture as on the vmware-iso builder, when using an esxi, it works as well. And there I am using dnsmasq as my dhcp. So there is no possibility to look in a file. Instead packer has to look using the open-vm-tools or the esxi api (I guess).
So, why not constantly looking for a change in the leases file and just taking the last entry as a valid ip.

I tried to build debian 10 with open-vm-tools in the preseed, but it still doesn't work.
I tried using the version 1.5.1 (official build) and vmware workstation 14.1.7 build-12989993 and tried as well with version 15.5.1 build-15018445

@llxp
Copy link

llxp commented Jan 10, 2020

I found a temporary workarount or rather a hack.
I created a new interface with a subnetmask of 255.255.255.248
and chose a dhcp range from 2 ips.
now the dhcp is only able to give 2 ips to the vm.
The first ip will be given during the installation.
The second ip will be given after the first reboot.

The config for the networking file in /etc/vmware/networking is as follows:

VERSION=1,0
answer VNET_10_DHCP yes
answer VNET_10_DHCP_CFG_HASH 8D292DB10AA5381B846E260EADE516BB459E6D65
answer VNET_10_HOSTONLY_NETMASK 255.255.255.248
answer VNET_10_HOSTONLY_SUBNET 172.16.230.0
answer VNET_10_NAT yes
answer VNET_10_NAT_PARAM_UDP_TIMEOUT 30
answer VNET_10_VIRTUAL_ADAPTER yes
answer VNET_1_DHCP yes
answer VNET_1_DHCP_CFG_HASH B70C98E2E155E3E7349FFCA26CE5694851E233FB
answer VNET_1_HOSTONLY_NETMASK 255.255.255.0
answer VNET_1_HOSTONLY_SUBNET 172.16.65.0
answer VNET_1_VIRTUAL_ADAPTER yes
answer VNET_8_DHCP yes
answer VNET_8_DHCP_CFG_HASH 39B4FEBF27D7259C57192A984AA39AD7DDA1FAC4
answer VNET_8_HOSTONLY_NETMASK 255.255.255.0
answer VNET_8_HOSTONLY_SUBNET 172.16.229.0
answer VNET_8_NAT yes
answer VNET_8_VIRTUAL_ADAPTER yes
answer VNL_DEFAULT_BRIDGE_VNET -1
add_bridge_mapping ens192 -1
add_bridge_mapping br1 0

The config file in /et/vmware/vmnet10/dhcpd/dhcpd.conf is as follows:

# Configuration file for ISC 2.0 vmnet-dhcpd operating on vmnet10.
#
# This file was automatically generated by the VMware configuration program.
# See Instructions below if you want to modify it.
#
# We set domain-name-servers to make some DHCP clients happy
# (dhclient as configured in SuSE, TurboLinux, etc.).
# We also supply a domain name to make pump (Red Hat 6.x) happy.
#


###### VMNET DHCP Configuration. Start of "DO NOT MODIFY SECTION" #####
# Modification Instructions: This section of the configuration file contains
# information generated by the configuration program. Do not modify this
# section.
# You are free to modify everything else. Also, this section must start
# on a new line
# This file will get backed up with a different name in the same directory
# if this section is edited and you try to configure DHCP again.

# Written at: 01/10/2020 11:01:36
allow unknown-clients;
default-lease-time 1800;                # default is 30 minutes
max-lease-time 7200;                    # default is 2 hours

subnet 172.16.230.0 netmask 255.255.255.248 {
        range 172.16.230.4 172.16.230.6;
        option broadcast-address 172.16.230.7;
        option domain-name-servers 172.16.230.2;
        option domain-name localdomain;
        default-lease-time 1800;                # default is 30 minutes
        max-lease-time 7200;                    # default is 2 hours
        option netbios-name-servers 172.16.230.2;
        option routers 172.16.230.2;
}
host vmnet10 {
        hardware ethernet 00:50:56:C0:00:0A;
        fixed-address 172.16.230.1;
        option domain-name-servers 0.0.0.0;
        option domain-name "";
        option routers 0.0.0.0;
}
####### VMNET DHCP Configuration. End of "DO NOT MODIFY SECTION" #######

The config file in /etc/vmware/vmnet10/nat/nat.conf is as follows:

# VMware NAT configuration file
# Manual editing of this file is not recommended. Using UI is preferred.

[host]

# NAT gateway address
ip = 172.16.230.2
netmask = 255.255.255.248

# VMnet device if not specified on command line
device = /dev/vmnet10

# Allow PORT/EPRT FTP commands (they need incoming TCP stream ...)
activeFTP = 1

# Allows the source to have any OUI.  Turn this on if you change the OUI
# in the MAC address of your virtual machines.
allowAnyOUI = 1

# Controls if (TCP) connections should be reset when the adapter they are
# bound to goes down
resetConnectionOnLinkDown = 1

# Controls if (TCP) connection should be reset when guest packet's destination
# is NAT's IP address
resetConnectionOnDestLocalHost = 1

# Controls if enable nat ipv6
natIp6Enable = 0

# Controls if enable nat ipv6
natIp6Prefix = fd15:4ba5:5a2b:100a::/64

[tcp]

# Value of timeout in TCP TIME_WAIT state, in seconds
timeWaitTimeout = 30

[udp]

# Timeout in seconds. Dynamically-created UDP mappings will purged if
# idle for this duration of time 0 = no timeout, default = 60; real
# value might be up to 100% longer
timeout = 30

[netbios]
# Timeout for NBNS queries.
nbnsTimeout = 2

# Number of retries for each NBNS query.
nbnsRetries = 3

# Timeout for NBDS queries.
nbdsTimeout = 3

[incomingtcp]

# Use these with care - anyone can enter into your VM through these...
# The format and example are as follows:
#<external port number> = <VM's IP address>:<VM's port number>
#8080 = 172.16.3.128:80

[incomingudp]

# UDP port forwarding example
#6000 = 172.16.3.0:6001

@akutz
Copy link

akutz commented Jan 10, 2020

I wonder, why this is an issue with the architecture as on the vmware-iso builder, when using an esxi, it works as well. And there I am using dnsmasq as my dhcp. So there is no possibility to look in a file. Instead packer has to look using the open-vm-tools or the esxi api (I guess).
So, why not constantly looking for a change in the leases file and just taking the last entry as a valid ip.

I tried to build debian 10 with open-vm-tools in the preseed, but it still doesn't work.
I tried using the version 1.5.1 (official build) and vmware workstation 14.1.7 build-12989993 and tried as well with version 15.5.1 build-15018445

I also found that on Photon it was failing after a while after the first attempt after a reboot. Finally I got it working by killing this in between attempts:

$ sudo ps alx | grep vagrant
    0  2316     1   0  20  0 558440636    384 -      Ss     ??    0:00.11 /opt/vagrant-vmware-desktop/bin/vagrant-vmware-utility api -port=9922

Keep in mind, I'm not running Vagrant. But I bet Packer is utilizing something from Vagrant.

@akutz
Copy link

akutz commented Jan 10, 2020

Based on this changelog, https://github.com/hashicorp/vagrant-plugin-changelog/blob/master/vagrant-vmware-utility-changelog.md, it does appear the vagrant-vmware-utility is used for DHCP in some capacity.

@akutz
Copy link

akutz commented Jan 10, 2020

Perhaps related to hashicorp/vagrant#9915?

@akutz
Copy link

akutz commented Jan 10, 2020

Hi @DanHam,

I just noticed my vagrant-vmware-utility is 1.0.5 and 1.0.7 (download) is the most recent version. I'm going to upgrade this and see if it helps.

@llxp
Copy link

llxp commented Jan 12, 2020

It seems, using the WinRM Communicator, packer is constantly querying for a new ip.
I now have found a "proper" workaround better that the previous one, I mentioned earlier. (the other hacky workaround is not realiably working)
I am now faking a dhcp server on a bridged network interface by creating a dhcpd.conf and dhcpd.leases file in the /etc/vmware/<vmnet>/dhcpd/ directory.
I am filling the leases file by parsing the packer output asking my dhcp on the network (dnsmasq dhcp, using a custom rest api + curl) which ip belongs to the hardware address. Additional to that, I configured the dhcp to only assign one ip per hardware address and to ignore the client id.
Additionally I implemented a script on the dhcp, which is beeing run everytime a new lease is beeing created. The script is then creating an entry in /etc/ethers file to create a static assignment.

VMWare workstation 15 is using a ISC DHCP version 2, because of that, the option to ignore the client id is not implemented. It is a default option from many standard dhcp servers out there to prefer the client id over the hardware id, when there is one. Even the very new kea dhcp from isc is using that as a default option. That's why I came up with this "workaround". It seems, the option to ignore the client identifier in combination with the script is a proper workaround.

I created a gist for people having the same problem:
https://gist.github.com/llxp/006ad6c7aa5d81e7283631e76fd1ed71

@Akvinikym
Copy link

Hello.

We had the same issue and personally for us such an approach worked: build a virtual machine for VirtualBox, as it later can be imported to VMware with ease. I did not find the restrictions of this approach for now, but it works with Debian 10 at least.

@jan-z
Copy link

jan-z commented Mar 5, 2020

For me adding open-vm-tools to the preseed file did not work.

I found a workaround to add to the preseed file:

d-i preseed/late_command string \ sed -i 's/^#*\(send dhcp-client-identifier\).*$/\1 = hardware;/' /target/etc/dhcp/dhclient.conf

That sets the dhcp-client-identifier option so that the MAC address is used. This was the default on versions prior to Debian 10 (see https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=906894).

@DanHam
Copy link
Contributor Author

DanHam commented Mar 8, 2020

@jan-z Nice workaround! This works for me too.

Can I ask where you found the documentation for setting the client identifier to = hardware? Maybe I'm not looking hard enough but I can't seem to find that anywhere in the dhclient docs.

The calls made by ifupdown to dhclient are hardcoded into the ifup binary. The fix for Debian bug 906894 added the '-i' flag. As you know this flag results in the DUID being sent to the DHCP server and this has been the root cause of the issues we have been seeing.

The next version of ifupdown should allow the user to configure whether the calls to dhclient include the '-i' flag/send the DUID. See Debian bug 923640 and the fix HERE

@jan-z
Copy link

jan-z commented Mar 8, 2020

@DanHam I can't find any documentation for that setting. I got it from here: https://www.reddit.com/r/debian/comments/ca5vjb/dhcp_identifiers_changed_on_upgrade_to_buster/

@DanHam
Copy link
Contributor Author

DanHam commented Mar 9, 2020

@jan-z Ah OK. I had seen that...

I've taken a more in-depth look. It's a bit difficult to find but this is documented in the SETTING OPTION VALUES USING EXPRESSIONS section of the dhcp-options(5) man page. The syntax for the expressions is documented in the dhcp-eval(5) man page.

@SwampDragons @azr The workaround put forward by @jan-z is the right way to fix this - at least until THIS FIX works it's way in to the next Debian ifupdown package.

Do you want to close this or leave it open for a while for others coming across the same issue?

DanHam added a commit to DanHam/packer-templates that referenced this issue Mar 9, 2020
See: hashicorp/packer#8528

The ifupdown package in Debian 10 (ifupdown 0.8.35) is now configured to
call dhclient with a flag that sends the DUID as the host identifier
when obtaining an IP address from the DHCP server. This differs from the
installer which uses the MAC address as the identifier.

As a consequence, the DHCP server issues different IP addresses to the
installer and the newly built host. This causes issues for Packer as the
IP address used to contact the host is only enumerated when the
installer is running at the start of the build. Since the newly booted
host has a different IP address, Packer fails to establish a connection
to the host.

This commit fixes the issue by configuring the DHCP client in the newly
built host to use the MAC address - the same as is done by the
installer. Once Packer has established a connection, the change is
reverted to the default configuration shipped with the isc-dhcp-client
package.

Future versions of ifupdown will allow the user to configure if dhclient
is called with the flag to send the DUID to the DHCP server.
@azr
Copy link
Contributor

azr commented Mar 9, 2020

🤔 hm, IMHO, a doc page would be nice, like a short title/description that matches the issue so it's easy to Google, stating all possible options. So users can understand the issue and pick the path they want: update/fix/else.

Edit: Super good findings !

@melck
Copy link

melck commented Mar 11, 2020

I'm experiencing the same issue with Workstation 15.5 (ubuntu), packer 1.5.4 and Photon OS 3 Rev 2.

The workaround with open-vm-tools package is'nt working. Do you know how i can work around it without change of vmware configurations ?

@nywilken nywilken self-assigned this Mar 24, 2020
@matteofilippetto
Copy link

@jan-z workaround works for me too with

packer version 1.5.5
vmware fusion 11.5.3
installing debian-10.3.0-amd64-netinst

nickcharlton added a commit to nickcharlton/boxes that referenced this issue Apr 4, 2020
Buster has two preseed changes:

* We now need to set partman to use the full size of the new partition,
* We need to change the new default DHCP lease behaviour to use the MAC
  address, so that during the build process we keep the same IP address

Here, we move the existing preseed to `preseed-legacy.cfg` to keep it
the same and introduce a new one.

A given preseed file can only contain a single `late_command`, so this
replaces the existing one …this isn't needed anyway.

hashicorp/packer#8528
@sulaweyo
Copy link

sulaweyo commented Apr 27, 2020

Same for me on Archlinux builds. Worked like a charm before but not anymore in 1.5. To be more precise the KVM build still works, just vmware does not. After the reboot the vm has a different IP and packer can't connect

@kclinden
Copy link

kclinden commented May 6, 2020

Same issue here with Packer 1.5.6, Photon OS 3 Rev2, and Fusion 11.5.3.

@kclinden
Copy link

@pierreilki - I tried that as well :( It still wanted to pick up a new DHCP record causing packer to get the wrong ip. I also tried setting the hostname of the system to match that of the installer. In my case, this was photon-installer.

akarasulu added a commit to subutai-io/packer that referenced this issue May 18, 2020
…/packer#8528 and it works just fine now with vmware packer provider
@nywilken nywilken removed their assignment May 18, 2020
arizvisa added a commit to arizvisa/lolfuzz3 that referenced this issue May 24, 2020
…generated and used during the entire build process.

This changes the boot-command used for building the box so that we can seed systemd
with the generated machine-id. As CoreOS does not have a forgiving timeout, we have
to literally mash keys and then hit backspace in order to catch the boot loader in
order to type in our id at the commandline. This machine-id is also fed into the
ignition file in order to seed the box with it on first boot.

The intention of these changes are so that systemd's dhcp client will retain the
same address during the entire build process. All of this is to work around issue
hashicorp/packer#8528 and is pretty much because Packer is unable to re-determine
the address of the guest if it changes whilst in the middle of building. It turns
out that the latest version of CoreOS resets the machine-id after the install which
results in this specific issue.
@arizvisa
Copy link
Contributor

arizvisa commented May 28, 2020

Hey y'all. So I wrote some unit tests for the majority of the vmware builder parsers (#9303), and did some refactoring of the dhcpd lease parsers (#9319). The reason being is because it looks like this issue can be "kind of" solved in builder/vmware/common/ssh.go.

This CommHost function asks the driver.GuestIP function for what address to use. Before #9319, the driver.GuestIP function was just using regexes to grab any lease that matched the hw address. The issue that we're encountering (or at least that I am) is that there's more than one lease with the same hw address. The only thing that's different is the "uid" field which is what the dhcpcd in our guest is using to fetch the address. This should be what we're actually "key"-ing on, but there's no good way to export the "uid" from a guest. So, why not grab "everything" that matches, and try that?

I had to rewrite the dhcpd lease parsers so that it would first-of-all be easier to test, but so that it would not only parse the dhcpd leases... but be wayyy easier to extract more than one match, and on any particular field (uid in our case). This way in CommHost, we can ask driver.GuestIP which leases match, and check each one invidually to see what works.

There may be some issues with doing this that I don't see yet, but I broke up my intentions into separate stages so that it can be easier to review their individual modifications. PR #9319 should be completely backwards compatible with the way the dhcpd lease parser is currently working, and the next PR (which I'm going to start working on in a minute) will end up working in a non-backwards-compatible way due to changing the way that CommHost and driver.GuestIP interact.

@arizvisa
Copy link
Contributor

Okay...I'm actually surprised it works, but PR #9322 reworks driver.GuestIP so that it returns a list of addresses using the new dhcpd.leases parser from PR #9319. Then in CommHost, it takes the list of addresses and tries each one until one of them is valid. That address is then used to ssh to the guest.

Now it'll take like a second or two for it to recognize that ssh is up, but packer seems to recognize the new address and continue to the next multistep like it's supposed to.

So.. PR #9322 should fix this..properly., and without hacking up your guest or VMware configuration.

@akutz
Copy link

akutz commented May 28, 2020

I wanted to take a moment to thank you for all your hard work on this @arizvisa!

@arizvisa
Copy link
Contributor

Thx. Anything for another austinite. ;)

SwampDragons added a commit that referenced this issue Jun 2, 2020
Unit tests for the driver_parser.go functionality of the vmware builder
@praseodym
Copy link

I think this was accidentally closed due to a “close” keyword in #9303, while that PR doesn’t actually fix this issue.

@SwampDragons
Copy link
Contributor

Good catch, thanks.

@SwampDragons SwampDragons reopened this Jun 3, 2020
SwampDragons added a commit that referenced this issue Jun 4, 2020
This refactors the dhcpd lease parser in the vmware builders and adds unit tests for everything.
SwampDragons added a commit that referenced this issue Jun 5, 2020
Fix the VMware builders when the guest platform's dhcpcd switches the ip address in-between a build
@SwampDragons
Copy link
Contributor

I think this was closed "for real" by PR 9322. We'll be releasing v1.6.0 early next week.

@nywilken
Copy link
Member

nywilken commented Jun 8, 2020

Fixed confirmed using the configuration files linked above by @DanHam. Note I fixed the deprecation issue locally for the iso_checksum_type configuration attribute before running against v1.6.0-dev.

vmware-iso: output will be in this color.

==> vmware-iso: Retrieving ISO
==> vmware-iso: Trying https://cdimage.debian.org/cdimage/archive/10.2.0/amd64/iso-cd/debian-10.2.0-amd64-netinst.iso                             
==> vmware-iso: Trying https://cdimage.debian.org/cdimage/archive/10.2.0/amd64/iso-cd/debian-10.2.0-amd64-netinst.iso?checksum=sha512%3A5495c8378b829df7386b9bac5bc701f7ad8b2843d088e8636c89549519cf176100eacb90121af3934a8c5229cbe7d2fd23342eda330d56fb45fb2d91f2117fb4                             
==> vmware-iso: https://cdimage.debian.org/cdimage/archive/10.2.0/amd64/iso-cd/debian-10.2.0-amd64-netinst.iso?checksum=sha512%3A5495c8378b829df7386b9bac5bc701f7ad8b2843d088e8636c89549519cf176100eacb90121af3934a8c5229cbe7d2fd23342eda330d56fb45fb2d91f2117fb4 => /home/wilken/pkg/packer-testing-master/vmware-dhcp/packer_cache/aa283600cd4c412a3090a9399e251328ffc7ccfa.iso                                                                       
==> vmware-iso: Creating required virtual machine disks
==> vmware-iso: Building and writing VMX file
==> vmware-iso: Starting HTTP server on port 8586
==> vmware-iso: Starting virtual machine...
==> vmware-iso: Waiting 5s for boot...
==> vmware-iso: Connecting to VM via VNC (127.0.0.1:5911)
==> vmware-iso: Typing the boot command over VNC...
==> vmware-iso: Waiting for SSH to become available...
==> vmware-iso: Connected to SSH!
==> vmware-iso: Provisioning with shell script: /tmp/packer-shell527574289                                                                        
==> vmware-iso: Running local shell script: /tmp/packer-shell536900249
    vmware-iso: 4bb8ee9c-5bfc-1977-6797-ed334dcbd96c
==> vmware-iso: Gracefully halting virtual machine...
    vmware-iso: Waiting for VMware to clean up after itself...
==> vmware-iso: Deleting unnecessary VMware files...
    vmware-iso: Deleting: output-debian-10-vmware-iso/vmware.log
==> vmware-iso: Compacting all attached virtual disks...
    vmware-iso: Compacting virtual disk 1
==> vmware-iso: Cleaning VMX prior to finishing up...
    vmware-iso: Detaching ISO from CD-ROM device ide0:0...
    vmware-iso: Disabling VNC server...
==> vmware-iso: Skipping export of virtual machine (export is allowed only for ESXi)...                                                           
Build 'vmware-iso' finished.

@arizvisa
Copy link
Contributor

arizvisa commented Jun 8, 2020

Since a list of hosts is being checked linearly depending on how many leases match (as opposed to before)... Is there a noticeable difference for y'all in the time it takes to detect the new address from the previous method? It likely doesn't really matter, but is it as significant for you guys as it is for me?

Also, is the vmware builder the only one which uses the method of parsing the dhcp leases in order to determine the address of the guest?

@DanHam
Copy link
Contributor Author

DanHam commented Jun 9, 2020

@arizvisa Just like to second the thanks above for the fix! Really appreciated!

I have to say, I didn't really look time my build/watch it too much, but I didn't notice any significant delays. Things worked pretty much as they did before for me.

@DanHam
Copy link
Contributor Author

DanHam commented Jun 10, 2020

@arizvisa A quick update on my comment above.

I've now built a few boxes with the latest Packer build. I've had mixed results with respect to the time it takes for the new address to be picked up by Packer.

Sometimes the address is picked up quickly. Other times there is a very noticeable delay - in the region of minutes - while the box is sitting there post reboot waiting for Packer to connect.

@arizvisa
Copy link
Contributor

Hmm.. I wonder if performance can be improved slightly by sorting the list of leases that we parse in descending order (at the end of the PotentialGuestIP function) so that the newer leases are attempted to be connected to first... Another thing that might be worth trying is to make the connections in CommHost in parallel, as essentially the logic is the same as a portscanner.

Anyways, just some some potential solutions to consider. I'll leave this experimentation up to the maintainers or another contributor for the moment perhaps until I can find more time.

@SwampDragons
Copy link
Contributor

Makes sense, thanks @arizvisa for getting it this far :)

@arizvisa
Copy link
Contributor

course. i got u.

@ghost
Copy link

ghost commented Jul 6, 2020

I'm going to lock this issue because it has been closed for 30 days ⏳. This helps our maintainers find and focus on the active issues.

If you have found a problem that seems similar to this, please open a new issue and complete the issue template so we can capture all the details necessary to investigate further.

@ghost ghost locked and limited conversation to collaborators Jul 6, 2020
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging a pull request may close this issue.