Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cluster: Race condition with WaitForCluster #190

Open
codykonior opened this issue Oct 1, 2018 · 1 comment
Open

Cluster: Race condition with WaitForCluster #190

codykonior opened this issue Oct 1, 2018 · 1 comment
Labels
enhancement The issue is an enhancement request. help wanted The issue is up for grabs for anyone in the community.

Comments

@codykonior
Copy link
Contributor

codykonior commented Oct 1, 2018

Details of the scenario you tried and the problem that is occurring

This is the expected way of doing things which usually works:

  • xCluster on first node
  • xCluster DependsOn xWaitForCluster on all other nodes

I'm building 1+4 node clusters (all in one subnet) and in these configurations with a lot of nodes a race condition occurs where xWaitForCluster believes the cluster is ready but xCluster will fail randomly during Test-Resource with a "The name used to access the cluster is not currently available" message.

So the create works, and one or two add nodes will pass, and the rest will fail. This is rectified about 10-15 minutes when DSC re-runs the configuration but that's as long again as building the entire lab.

I can only imagine there is a short window where the cluster exists but the add to node isn't functional just yet. I replaced xWaitForCluster with WaitForAll on the first node's cluster - this seemed to work more reliably for single-subnets but for multi-subnets (which might briefly take the cluster offline when you add the subnet) it can also fail.

Suggested solution to the issue

What would you think about adding a RetryIntervalSec defaulting to 10 and RetryCount defaulting to 0 on the xCluster resource? This can attempt to retry whatever operation it is doing in case of a transient failure like it flipping on and off during other node's operations, avoids throwing transient errors to the DSC logs and waiting for the entire process to re-run.

The operating system the target node is running

Windows Server 2012

Version and build of PowerShell the target node is running

WMF 5.1

Version of the DSC module that was used ('dev' if using current dev branch)

dev

@johlju
Copy link
Member

johlju commented Oct 1, 2018

Yes, that sounds like it could work. But I think we could leave out the default value of those two new parameters, and only do the retry test if either of those are in PSBoundParameters. If either of them are not assigned, an error can be thrown indicating that both must be set. Or if either are not wet hen we could set it to a default value (in the code, and not on the parameter).
I think, if a parameter is not used in a configuration we should avoid having it set to a default value. 🤔

@johlju johlju added enhancement The issue is an enhancement request. help wanted The issue is up for grabs for anyone in the community. labels Oct 1, 2018
@SteveL-MSFT SteveL-MSFT added this to Help Wanted in powershell/dscresources May 14, 2019
@SteveL-MSFT SteveL-MSFT removed this from Help Wanted in powershell/dscresources Nov 27, 2019
@johlju johlju changed the title xCluster: Race condition with xWaitForCluster Cluster: Race condition with WaitForCluster Jun 11, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement The issue is an enhancement request. help wanted The issue is up for grabs for anyone in the community.
Projects
None yet
Development

No branches or pull requests

2 participants