Same Hardware assigned to multiple TinkerbellMachines when maxUnavailable >= 2 in MachineDeployment #330

rockc2020 · 2023-12-02T00:26:04Z

I setup a Cluster API environment with Tinkerbell provider, plus a tinkerbell stack on a single server by following this https://github.com/tinkerbell/cluster-api-provider-tinkerbell/tree/main/docs. It successfully provisioned a workload K8s cluster (3 control plane nodes + 3 workload nodes) where all servers are physical machines.
When testing rolling restart a MachineDeployment which contains 3 workload nodes, I set maxUnavailable: 3 (an extreme case that I want to test how refreshing (actually re-image + rejoin) nodes works in parallel. Then, a single Hardware was assigned to two TinkerbellMachines and so the rolling restart got stuck.

Expected Behaviour

The nodes managed by the MachineDeployment should be linked to different Hardwares even when multiple nodes are getting refreshed or restarted.

Current Behaviour

From the screenshot, the hardware n62-107-74 was linked to two TinkerbellMachines: capi-quickstart-worker-a-2kjwb and capi-quickstart-worker-a-h9f8z.

Possible Solution

Explained in the context below.

Steps to Reproduce (for bugs)

Provision a MachineDeployment with multiple nodes (>= 2).
Set the strategy of the MachineDeployment:

  strategy:
    rollingUpdate:
      maxSurge: 0
      maxUnavailable: 3 # Just set it to 3 for easy repo.
    type: RollingUpdate

Rolling restart the MachineDeployment with clusterctl:

clusterctl alpha rollout restart machinedeployment/capi-quickstart-worker-a

All 3 Machines would be deleted first (including TinkerbellMachines, Workflows as well) and then enter into provisioning stage.
Then, it would be highly possible to happen that a single Hardware is linked to multiple TinkerbellMachines (2 or 3).

Context

This issue blocks nodes upgrading and restarting.
After digging into code, it looks like the owner labels on the Hardware got deleted twice (even 3 times).
Let's use the case above as an example, and the here are labels on the Hardware:

It could happen like:

The 3 machines were being deleted.
The reconcile of machine (n62-107-74) calls DeleteMachineWithDependencies() method, which deletes the ownership labels on the Hardware and then creates a PowerOff bmc job to shutdown this (n62-107-74) machine.
For reconcile of machine (n62-107.78), it might be faster and already completes its bmc job. Then it will call ensureHardware() to select a Hardware for itself as well. So, it's possible to select Hardware n62-107-74 as the ownership labels of n62-107-74 got deleted in step 2. It adds the ownership labels for Hardware n62-107-74. eg: v1alpha1.tinkerbell.org/ownerName=capi-quickstart-worker-a-h9f8z
The reconcile of machine (n62-107-74) calls DeleteMachineWithDependencies() method again to double check if the PowerOff bmc job completes. Then inside DeleteMachineWithDependencies(), it deletes the ownership labels again, but the owenrship labels were just created in steps 3 by reconcile of machine (n62-107.78).
The reconcile of machine (n62-107-74) continues when the PowerOff bmc job completes, so it will call ensureHardware() to select a Hardware for itself, which is possible to select Hardware n62-107-74 as the ownership labels got deleted in step 4. So it put the ownership label as v1alpha1.tinkerbell.org/ownerName=capi-quickstart-worker-a-2kjwb.
Then it causes two machines (n62-107-74) and (n62-107.78) link to the same Hardware (n62-107-74).

A possible solution is to move deleting ownership labels behind checking PowerOff bmc job. Here is the code I just used which works well in my environment.

// DeleteMachineWithDependencies removes template and workflow objects associated with given machine.
func (scope *machineReconcileScope) DeleteMachineWithDependencies() error {
	scope.log.Info("Removing machine", "hardwareName", scope.tinkerbellMachine.Spec.HardwareName)

	// Fetch hardware for the machine.
	hardware := &tinkv1.Hardware{}
	if err := scope.getHardwareForMachine(hardware); err != nil {
		return err
	}

        // getOrCreateBMCPowerOffJob() is method I wrote for PowerOff job only.
	if bmcJob, err := scope.getOrCreateBMCPowerOffJob(hardware); err != nil {
		return err
	} else {
		// Check the Job conditions to ensure the power off job is complete.
		if !bmcJob.HasCondition(rufiov1.JobCompleted, rufiov1.ConditionTrue) {
			scope.log.Info("Waiting BMCJob of power off hardware to complete",
				"Name", bmcJob.Name,
				"Namespace", bmcJob.Namespace,
			)
			return nil
		}

		if bmcJob.HasCondition(rufiov1.JobFailed, rufiov1.ConditionTrue) {
			return fmt.Errorf("bmc job %s/%s failed", bmcJob.Namespace, bmcJob.Name) //nolint:goerr113
		}
	}

	// Only remove ownership labels here when BMC PowerOff job completes.
	if err := scope.removeDependencies(hardware); err != nil {
		return err
	}

	// The hardware BMCRef is nil.
	// Remove finalizers and let machine object delete.
	if hardware.Spec.BMCRef == nil {
		scope.log.Info("Hardware BMC reference not present; skipping hardware power off",
			"BMCRef", hardware.Spec.BMCRef, "Hardware", hardware.Name)
	}

	return scope.removeFinalizer()
}

Your Environment

Operating System and version (e.g. Linux, Windows, MacOS):
Debian 10
How are you running Tinkerbell? Using Vagrant & VirtualBox, Vagrant & Libvirt, on Packet using Terraform, or give details:
I deployed Tinkerbell stack as sanbox one with docker-compose on a server.
Link to your project or a code example to reproduce issue:
N/A

The text was updated successfully, but these errors were encountered:

chrisdoherty4 · 2023-12-02T00:42:34Z

Thank you for the thorough report.

rockc2020 · 2023-12-04T18:55:40Z

Thank you for the thorough report.

Sure and feel free let me know if that's the case and I'd be happy to raise a PR for a fix as well.

chrisdoherty4 added the kind/bug Categorizes issue or PR as related to a bug. label Dec 2, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Same Hardware assigned to multiple TinkerbellMachines when maxUnavailable >= 2 in MachineDeployment #330

Same Hardware assigned to multiple TinkerbellMachines when maxUnavailable >= 2 in MachineDeployment #330

rockc2020 commented Dec 2, 2023

chrisdoherty4 commented Dec 2, 2023

rockc2020 commented Dec 4, 2023

Same Hardware assigned to multiple TinkerbellMachines when maxUnavailable >= 2 in MachineDeployment #330

Same Hardware assigned to multiple TinkerbellMachines when maxUnavailable >= 2 in MachineDeployment #330

Comments

rockc2020 commented Dec 2, 2023

Expected Behaviour

Current Behaviour

Possible Solution

Steps to Reproduce (for bugs)

Context

Your Environment

chrisdoherty4 commented Dec 2, 2023

rockc2020 commented Dec 4, 2023