Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Same Hardware assigned to multiple TinkerbellMachines when maxUnavailable >= 2 in MachineDeployment #330

Open
rockc2020 opened this issue Dec 2, 2023 · 2 comments
Labels
kind/bug Categorizes issue or PR as related to a bug.

Comments

@rockc2020
Copy link

I setup a Cluster API environment with Tinkerbell provider, plus a tinkerbell stack on a single server by following this https://github.com/tinkerbell/cluster-api-provider-tinkerbell/tree/main/docs. It successfully provisioned a workload K8s cluster (3 control plane nodes + 3 workload nodes) where all servers are physical machines.
When testing rolling restart a MachineDeployment which contains 3 workload nodes, I set maxUnavailable: 3 (an extreme case that I want to test how refreshing (actually re-image + rejoin) nodes works in parallel. Then, a single Hardware was assigned to two TinkerbellMachines and so the rolling restart got stuck.

Expected Behaviour

The nodes managed by the MachineDeployment should be linked to different Hardwares even when multiple nodes are getting refreshed or restarted.

Current Behaviour

From the screenshot, the hardware n62-107-74 was linked to two TinkerbellMachines: capi-quickstart-worker-a-2kjwb and capi-quickstart-worker-a-h9f8z.
image

Possible Solution

Explained in the context below.

Steps to Reproduce (for bugs)

  1. Provision a MachineDeployment with multiple nodes (>= 2).
  2. Set the strategy of the MachineDeployment:
  strategy:
    rollingUpdate:
      maxSurge: 0
      maxUnavailable: 3 # Just set it to 3 for easy repo.
    type: RollingUpdate
  1. Rolling restart the MachineDeployment with clusterctl:
clusterctl alpha rollout restart machinedeployment/capi-quickstart-worker-a
  1. All 3 Machines would be deleted first (including TinkerbellMachines, Workflows as well) and then enter into provisioning stage.
  2. Then, it would be highly possible to happen that a single Hardware is linked to multiple TinkerbellMachines (2 or 3).

Context

This issue blocks nodes upgrading and restarting.
After digging into code, it looks like the owner labels on the Hardware got deleted twice (even 3 times).
Let's use the case above as an example, and the here are labels on the Hardware:
image

It could happen like:

  1. The 3 machines were being deleted.
  2. The reconcile of machine (n62-107-74) calls DeleteMachineWithDependencies() method, which deletes the ownership labels on the Hardware and then creates a PowerOff bmc job to shutdown this (n62-107-74) machine.
  3. For reconcile of machine (n62-107.78), it might be faster and already completes its bmc job. Then it will call ensureHardware() to select a Hardware for itself as well. So, it's possible to select Hardware n62-107-74 as the ownership labels of n62-107-74 got deleted in step 2. It adds the ownership labels for Hardware n62-107-74. eg: v1alpha1.tinkerbell.org/ownerName=capi-quickstart-worker-a-h9f8z
  4. The reconcile of machine (n62-107-74) calls DeleteMachineWithDependencies() method again to double check if the PowerOff bmc job completes. Then inside DeleteMachineWithDependencies(), it deletes the ownership labels again, but the owenrship labels were just created in steps 3 by reconcile of machine (n62-107.78).
  5. The reconcile of machine (n62-107-74) continues when the PowerOff bmc job completes, so it will call ensureHardware() to select a Hardware for itself, which is possible to select Hardware n62-107-74 as the ownership labels got deleted in step 4. So it put the ownership label as v1alpha1.tinkerbell.org/ownerName=capi-quickstart-worker-a-2kjwb.
  6. Then it causes two machines (n62-107-74) and (n62-107.78) link to the same Hardware (n62-107-74).

A possible solution is to move deleting ownership labels behind checking PowerOff bmc job. Here is the code I just used which works well in my environment.

// DeleteMachineWithDependencies removes template and workflow objects associated with given machine.
func (scope *machineReconcileScope) DeleteMachineWithDependencies() error {
	scope.log.Info("Removing machine", "hardwareName", scope.tinkerbellMachine.Spec.HardwareName)

	// Fetch hardware for the machine.
	hardware := &tinkv1.Hardware{}
	if err := scope.getHardwareForMachine(hardware); err != nil {
		return err
	}

        // getOrCreateBMCPowerOffJob() is method I wrote for PowerOff job only.
	if bmcJob, err := scope.getOrCreateBMCPowerOffJob(hardware); err != nil {
		return err
	} else {
		// Check the Job conditions to ensure the power off job is complete.
		if !bmcJob.HasCondition(rufiov1.JobCompleted, rufiov1.ConditionTrue) {
			scope.log.Info("Waiting BMCJob of power off hardware to complete",
				"Name", bmcJob.Name,
				"Namespace", bmcJob.Namespace,
			)
			return nil
		}

		if bmcJob.HasCondition(rufiov1.JobFailed, rufiov1.ConditionTrue) {
			return fmt.Errorf("bmc job %s/%s failed", bmcJob.Namespace, bmcJob.Name) //nolint:goerr113
		}
	}

	// Only remove ownership labels here when BMC PowerOff job completes.
	if err := scope.removeDependencies(hardware); err != nil {
		return err
	}

	// The hardware BMCRef is nil.
	// Remove finalizers and let machine object delete.
	if hardware.Spec.BMCRef == nil {
		scope.log.Info("Hardware BMC reference not present; skipping hardware power off",
			"BMCRef", hardware.Spec.BMCRef, "Hardware", hardware.Name)
	}

	return scope.removeFinalizer()
}

Your Environment

  • Operating System and version (e.g. Linux, Windows, MacOS):
    Debian 10

  • How are you running Tinkerbell? Using Vagrant & VirtualBox, Vagrant & Libvirt, on Packet using Terraform, or give details:
    I deployed Tinkerbell stack as sanbox one with docker-compose on a server.

  • Link to your project or a code example to reproduce issue:
    N/A

@chrisdoherty4
Copy link
Member

Thank you for the thorough report.

@chrisdoherty4 chrisdoherty4 added the kind/bug Categorizes issue or PR as related to a bug. label Dec 2, 2023
@rockc2020
Copy link
Author

Thank you for the thorough report.

Sure and feel free let me know if that's the case and I'd be happy to raise a PR for a fix as well.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/bug Categorizes issue or PR as related to a bug.
Projects
None yet
Development

No branches or pull requests

2 participants