Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Custom deployment failed #2530

Open
potejasw opened this issue May 3, 2024 · 10 comments
Open

Custom deployment failed #2530

potejasw opened this issue May 3, 2024 · 10 comments
Labels
triage issue or feature up for triage

Comments

@potejasw
Copy link

potejasw commented May 3, 2024

**Is your issue related to a Jumpstart scenario, , HCIBox

Describe the issue or the bug
OperationTimeout , No updates received from device for operation

{"code":"ArcOperationTimedOut","target":"/subscriptions/3f3df5ee-74f3-4aa8-83d2-fa6558733b45/resourceGroups/PCTHCIBOX-rg/providers/Microsoft.HybridCompute/machines/AzSHOST1","message":"OperationTimeout , No updates received from device for operation: [providers/microsoft.azurestackhci/locations/EASTUS/operationStatuses/98438b4f-e55d-4580-9649-82be41c323d9*E803284F08085E1E43A65AF9A5F9852A3E3D07A9C81917EE350B85B2BFC1CABF?api-version=2023-08-01-preview] beyond timeout of [600000] ms"}

Raw error:
{
"code": "ArcOperationTimedOut",
"target": "/subscriptions/3f3df5ee-74f3-4aa8-83d2-fa6558733b45/resourceGroups/PCTHCIBOX-rg/providers/Microsoft.HybridCompute/machines/AzSHOST1",
"message": "OperationTimeout , No updates received from device for operation: [providers/microsoft.azurestackhci/locations/EASTUS/operationStatuses/98438b4f-e55d-4580-9649-82be41c323d9*E803284F08085E1E43A65AF9A5F9852A3E3D07A9C81917EE350B85B2BFC1CABF?api-version=2023-08-01-preview] beyond timeout of [600000] ms"
}

To Reproduce

Expected behavior
Complete the custom deployment of HCI box.

Environment summary
Az HCI 23H2

Have you looked at the Troubleshooting and Logs section?

Screenshots

image

image

Additional context
HCI deployment.

@potejasw potejasw added the triage issue or feature up for triage label May 3, 2024
@potejasw
Copy link
Author

potejasw commented May 6, 2024

Hi Team, you have any update or any engineer assigned?

@likamrat
Copy link
Contributor

likamrat commented May 6, 2024

Hi @potejasw, thx for opening the issue. We will have someone assigned to this in a few days as we currently getting ready for a few major releases. Thx for your patience and understanding.

@katriendg
Copy link

I also wanted to add that trying out the HCIBox Jumpstart using CLI option, is failing after a few hours running the New-HCIBoxCluster.ps1 script at logon. Step 10 fails upon Validation, and the error message in the portal is the following (note I removed my resource names)

{"code":"UpdateDeploymentSettingsDataFailed","message":"Deployment Settings validation failed.","details":
[{"code":"UpdateDeploymentSettingsDataFailed","target":"/subscriptions/[.......]/resourceGroups/[.......]/providers/Microsoft.AzureStackHCI/clusters/hciboxcluster","message":"Failed to create deployment settings. \nValidation status is {Status=Error, Steps={Name=Error, Description=Error executing Request: Validate, FullStepIndex=0, StartTimeUtc=5/7/2024 4:17:38 PM, EndTimeUtc=NA, Status=Error, Exception=Exception: One or more errors occurred. at:   at 
System.Threading.Tasks.Task.ThrowIfExceptional(Boolean includeTaskCanceledExceptions)\r\n   at System.Threading.Tasks.Task.Wait(Int32 millisecondsTimeout, CancellationToken cancellationToken)\r\n   at Microsoft.AzureStack.Solution.Deploy.LCMController.ArcCommunication.ActionPlanController.ExecuteRequest(Request request) in 
C:\\__w\\1\\s\\src\\LCMController\\ArcCommunication\\Source\\LCMController.ArcCommunication\\ActionPlanController.cs:line 379 Base Exception: Failed to fetch secret:LocalAdminCredential
 from Key Vault https://[[.......]].vault.azure.net with:Response status code does not indicate success: 404 (Not Found). at:  
  at Microsoft.AzureStack.Solution.Deploy.LCMController.ArcCommunication.ActionPlanController.GetSecret(String keyVaultUri, String secretName) in C:\\__w\\1\\s\\src\\LCMController\\ArcCommunication\\Source\\LCMController.ArcCommunication\\ActionPlanController.cs:line 296\r\n   at Microsoft.AzureStack.Solution.Deploy.LCMController.ArcCommunication.ActionPlanController.<InitAnswerFileAndSecrets>d__9.MoveNext() in 
C:\\__w\\1\\s\\src\\LCMController\\ArcCommunication\\Source\\LCMController.ArcCommunication\\ActionPlanController.cs:line 253\r\n--- End of stack trace from previous location where exception was thrown ---\r\n   at System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw()\r\n   at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(Task task)\r\n   at Microsoft.AzureStack.Solution.Deploy.LCMController.ArcCommunication.ActionPlanController.<ExecuteMessagesFromResourceProvider>d__5.MoveNext() in C:\\__w\\1\\s\\src\\LCMController\\ArcCommunication\\Source\\LCMController.ArcCommunication\\ActionPlanController.cs:line 94, Steps=null}}. \nDeployment Status is {Status=, Steps=null}"}]}

The secret LocalAdminCredential does exist in the Key Vault.

@potejasw
Copy link
Author

potejasw commented May 9, 2024

How do I try using CLI option. We have below two options to create cluster.
Arm template and Azure portal.

@katriendg
Copy link

@potejasw To clarify I meant the Azure CLI tutorial (which deploys through Bicep/Arm) and not the Azure Developer CLI one.
https://azurearcjumpstart.io/azure_jumpstart_hcibox/deployment_az

@potejasw
Copy link
Author

potejasw commented May 9, 2024

@katriendg The HCI box deployment completed. I can login to the VM.
But I have an issue in creating the Cluster from ARM template.

@potejasw
Copy link
Author

@katriendg You got any new to me?

@janegilring
Copy link
Contributor

@potejasw Could you give the following a try?

On the HCI nodes, navigate to C:\ProgramData\GuestConfig\extension_logs\Microsoft.Edge.DeviceManagementExtension\ and check the DeviceManagementExtension.log and state.json for any error messages. If none are found, rename the EdgeDevice.txt file to EdgeDevice.old, which will regenerate the latest device information and push it up to the cloud within 15 minutes

@potejasw
Copy link
Author

@janegilring I tried the above action plan. Re-tried to deploy the cluster using the ARM template and its failed with below error.
image

{"code":"UpdateDeploymentSettingsDataFailed","message":"Deployment Settings validation failed.","details":[{"code":"UpdateDeploymentSettingsDataFailed","target":"/subscriptions/xxxxxxxxxxxxxxx/resourceGroups/xxxxxx-rg/providers/Microsoft.AzureStackHCI/clusters/hciboxcluster","message":"Failed to create deployment settings. \nValidation status is {Status=Error, Steps={Name=SetRegistrationParametersInECEForCloudDeployment, Description=Set Registration parameters in ECE for cloud deployment., FullStepIndex=0, StartTimeUtc=2024-05-20T09:33:31, EndTimeUtc=2024-05-20T09:33:46, Status=Success, Exception=, Steps=}, {Name=InvokeEnvironmentChecker, Description=Invoke Environment Checker action plan., FullStepIndex=1, StartTimeUtc=2024-05-20T09:33:46, EndTimeUtc=2024-05-20T09:33:50, Status=Error, Exception=System.Collections.Generic.List`1[System.String], Steps=}}. \nDeployment Status is {Status=, Steps=null}"}]}

@janegilring
Copy link
Contributor

@potejasw Thanks for the update. At this point I would suggest deleting the resource group, run git pull in your local Jumpstart-folder and try a fresh deployment.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
triage issue or feature up for triage
Projects
None yet
Development

No branches or pull requests

4 participants