Skip to content
This repository has been archived by the owner on Dec 11, 2020. It is now read-only.

Register-PSSessionConfiguration causes WinRM service hanging in state 'stopping' #30

Open
jnury opened this issue Nov 17, 2017 · 15 comments
Labels

Comments

@jnury
Copy link
Contributor

jnury commented Nov 17, 2017

Hi,
I use DSC to deploy JEA configuration on many Windows Server 2012 R2 hosts:

PS > $psversiontable

Name                           Value
----                           -----
PSVersion                      5.1.14409.1012
PSEdition                      Desktop
PSCompatibleVersions           {1.0, 2.0, 3.0, 4.0...}
BuildVersion                   10.0.14409.1012
CLRVersion                     4.0.30319.42000
WSManStackVersion              3.0
PSRemotingProtocolVersion      2.3
SerializationVersion           1.1.0.1

About 3 times out of 4, when Register-PSSessionConfiguration is triggered by the DSC module, WinRM service is restarted but hangs on Stopping.

It seems to happen more frequently when the configuration causes WinRM to change Logon As (from Network Service to Local System).

Is there a 'correct' way to avoid this behaviour ?

We use the following script to force restart WinRM service (with SCCM as we lost PS remoting ability on host):

$winRMService = Get-Service -Name 'WinRM'
if ($winRMService -and $winRMService.Status -eq 'StopPending') {
    $processId = Get-CimInstance -ClassName 'Win32_Service' -Filter "Name LIKE 'WinRM'" | Select-Object -Expand 'ProcessId'
    $serviceList = Get-CimInstance -ClassName 'Win32_Service' -Filter "ProcessId=$processId" | Select-Object -Expand 'Name'
    $failure = @()
    Write-Host "Forcing process $processId to stop ..." -NoNewline
    try {
        Stop-Process -Id $processId -Force
        Write-Host ' done'
        Write-Host 'Waiting 5 seconds'
        Start-Sleep -Seconds 5
        foreach ($service in $serviceList) {
            Write-Host "Starting service $service ..." -NoNewline
            try {
                Start-Service -Name $service
                Write-Host ' done'
            } catch {
                Write-Host ' failed'
                $failure += "Start service $service"
            }
        }
    } catch {
        Write-Host ' failed'
        $failure += "Kill WinRM process"
    }

    if ($failure) {
        Throw "Failed to execute following operation(s): $($failure -join ', ')"
    }
}

Should-we add WinRM restart problem detection/mitigation directly in the DSC resource ? I can provide a PR for that (with a less verbose code ;-))

@jnury
Copy link
Contributor Author

jnury commented Nov 17, 2017

@PaulHigin: are you aware of problems in Register-PSSessionConfiguration/WinRM that could explain this behavior (and issue #31 ) ?

@rpsqrd
Copy link
Collaborator

rpsqrd commented Nov 17, 2017

/cc @manojampalam for the WinRM aspects

Thanks for reporting these issues, Julien. I've seen this behavior a few times, but nowhere near as frequently or consistently as you are describing. Have you seen this behavior on 2008 R2 / 2012 / 2016 as well, or just 2012 R2?

@jnury
Copy link
Contributor Author

jnury commented Nov 17, 2017

For now, I only deployed my configuration on 85 hosts, but only on Windows 2012 R2.
I'll have some targets on 2008 R2 soon, but only a few. I plan to deploy on 2016 soon too...
So, only tested on 2012 R2.

@rpsqrd rpsqrd added the bug label Nov 17, 2017
@djwork
Copy link

djwork commented Dec 11, 2017

This bug repeated consistently on 2016, I worked around it by putting the call to Register-PSSessionConfiguration within a PSJob, waiting for 10 seconds, setting $global:DSCMachineStatus = 1 at the end of the set block.

I will try and make my code a bit cleverer and then post it

@jnury
Copy link
Contributor Author

jnury commented Dec 19, 2017

@djwork : what about calling Register-PSSessionConfiguration within a PSJob and entering a loop until the job is OK or a timeout of, say 30 seconds, is expired. While in the loop, if the service is 'stoping' for more than, say 5 seconds, we run the 'force restart' script I mentioned above ?

It would be quite safe as the WinRM isn't left hanging, other services running with the same process are restarted as well and the resource would be compliant at first run.

Of course, patching WinRM to avoid hanging would be the best solution ;-)

@djwork
Copy link

djwork commented Dec 20, 2017

@jnury
That's what I did

[DscResource()]
class JeaEndpoint
{
## The mandatory endpoint name. Use 'Microsoft.PowerShell' by default.
[DscProperty(Key)]
[string] $EndpointName = 'Microsoft.PowerShell'

## The mandatory role definition map to be used for the endpoint. This
## should be a string that represents the Hashtable used for the RoleDefinitions
## property in New-PSSessionConfigurationFile, such as:
## RoleDefinitions = '@{ Everyone = @{ RoleCapabilities = "BaseJeaCapabilities" } }'
[Dscproperty(Mandatory)]
[string] $RoleDefinitions

## The optional groups to be used when the endpoint is configured to
## run as a Virtual Account
[DscProperty()]
[string[]] $RunAsVirtualAccountGroups

## The optional Group Managed Service Account (GMSA) to use for this
## endpoint. If configured, will disable the default behaviour of
## running as a Virtual Account
[DscProperty()]
[string] $GroupManagedServiceAccount

## The optional directory for transcripts to be saved to
[DscProperty()]
[string] $TranscriptDirectory

## The optional startup script for the endpoint
[DscProperty()]
[string[]] $ScriptsToProcess

## The optional switch to enable mounting of a restricted user drive
[Dscproperty()]
[bool] $MountUserDrive

## The optional size of the user drive. The default is 50MB.
[Dscproperty()]
[long] $UserDriveMaximumSize

## The optional number of seconds to wait for registering the endpoint to complete.
## The default is 10 seconds.
[Dscproperty()]
[int] $HungRegistrationTimeout = 10

## The optional number of times to retry starting the WinRM service.
## The default is 10.
[Dscproperty()]
[int] $MaximumWinRMStartRetry = 10

## The optional expression declaring which domain groups (for example,
## two-factor authenticated users) connected users must be members of. This
## should be a string that represents the Hashtable used for the RequiredGroups
## property in New-PSSessionConfigurationFile, such as:
## RequiredGroups = '@{ And = "RequiredGroup1", @{ Or = "OptionalGroup1", "OptionalGroup2" } }'
[Dscproperty()]
[string] $RequiredGroups

## Applies the JEA configuration
[void] Set()
{
    $psscPath = Join-Path ([IO.Path]::GetTempPath()) ([IO.Path]::GetRandomFileName() + ".pssc")

    ## Convert the RoleDefinitions string to the actual Hashtable
    $roleDefinitionsHash = $this.ConvertStringToHashtable($this.RoleDefinitions)

    $configurationFileArguments = @{
        Path = $psscPath
        RoleDefinitions = $roleDefinitionsHash
        SessionType = 'RestrictedRemoteServer'
    }

    if($this.RunAsVirtualAccountGroups -and $this.GroupManagedServiceAccount)
    {
        throw "The RunAsVirtualAccountGroups setting can not be used when a configuration is set to run as a Group Managed Service Account"
    }

    ## Set up the JEA identity
    if($this.RunAsVirtualAccountGroups)
    {
        $configurationFileArguments["RunAsVirtualAccount"] = $true
        $configurationFileArguments["RunAsVirtualAccountGroups"] = $this.RunAsVirtualAccountGroups
    }
    elseif($this.GroupManagedServiceAccount)
    {
        $configurationFileArguments["GroupManagedServiceAccount"] = $this.GroupManagedServiceAccount -replace '\$$', ''
    }
    else
    {
        $configurationFileArguments["RunAsVirtualAccount"] = $true
    }

    ## Transcripts
    if($this.TranscriptDirectory)
    {
        $configurationFileArguments["TranscriptDirectory"] = $this.TranscriptDirectory
    }

    ## Startup scripts
    if($this.ScriptsToProcess)
    {
        $configurationFileArguments["ScriptsToProcess"] = $this.ScriptsToProcess
    }

    ## Mount user drive
    if($this.MountUserDrive)
    {
        $configurationFileArguments["MountUserDrive"] = $this.MountUserDrive
    }

    ## User drive maximum size
    if($this.UserDriveMaximumSize)
    {
        $configurationFileArguments["UserDriveMaximumSize"] = $this.UserDriveMaximumSize
        $configurationFileArguments["MountUserDrive"] = $true
    }

    ## Required groups
    if($this.RequiredGroups)
    {
        ## Convert the RequiredGroups string to the actual Hashtable
        $requiredGroupsHash = $this.ConvertStringToHashtable($this.RequiredGroups)
        $configurationFileArguments["RequiredGroups"] = $requiredGroupsHash
    }

    ## Register the endpoint
    try
    {
        ## If we are replacing Microsoft.PowerShell, create a 'break the glass' endpoint
        if($this.EndpointName -eq "Microsoft.PowerShell")
        {
            $breakTheGlassName = "Microsoft.PowerShell.Restricted"
            if(-not (Get-PSSessionConfiguration -Name ($breakTheGlassName + "*") |
                Where-Object Name -eq $breakTheGlassName))
            {
                Register-PSSessionConfiguration -Name $breakTheGlassName
            }
        }

        ## Remove the previous one, if any.
        $existingConfiguration = Get-PSSessionConfiguration -Name ($this.EndpointName + "*") |
            Where-Object Name -eq $this.EndpointName

        if($existingConfiguration)
        {
            Unregister-PSSessionConfiguration -Name $this.EndpointName
        }

        ## Create the configuration file
        New-PSSessionConfigurationFile @configurationFileArguments
        #Register-PSSessionConfiguration has been hanging because the WinRM service is stuck in Stopping state
        #therefore we need to run Register-PSSessionConfiguration within a job to allow us to handle a hanging WinRM service
        Start-Job -ScriptBlock {
            param($endpointName, $psscPath)
            Register-PSSessionConfiguration -Name $endpointName -Path $psscPath -Force -ErrorAction Stop
        } -ArgumentList ($this.EndpointName), $psscPath | Wait-Job -Timeout ($this.HungRegistrationTimeout) | Remove-Job -Force -ErrorAction SilentlyContinue
        #Note: above I used the "ArgumentList" rather than "$using:" because I don't know if "$using:this.EndpointName" will work

        #if WinRM is stilling Stopping after the job has completed / exceeded $this.HungRegistrationTimeout, force kill the underlying WinRM process
        if ((Get-Service -Name WinRM).Status -ieq 'Stopping') {
            $id = Get-WmiObject -Class Win32_Service -Filter "Name LIKE 'WinRM'" | Select-Object -ExpandProperty ProcessId
            Stop-Process -Id $id -Force
        }

        #if stopped try to start WinRM, with $this.MaximumWinRMStartRetry reties
        [int]$tryCount = 0
        while (((Get-Service -Name WinRM).Status -ieq 'Stopped') -and ($tryCount -le $this.MaximumWinRMStartRetry))
        {
            Write-Verbose -Message 'Starting WinRM service'
            Start-Service -Name WinRM
            Start-Sleep -Seconds 1
        }

        ## Enable PowerShell logging on the system
        $basePath = "HKLM:\Software\Policies\Microsoft\Windows\PowerShell\ScriptBlockLogging"
        if(-not (Test-Path $basePath))
        {
            $null = New-Item $basePath -Force
        }
        Set-ItemProperty $basePath -Name EnableScriptBlockLogging -Value "1"
    }
    finally
    {
        Remove-Item $psscPath
    }
}

# Tests if the resource is in the desired state.
[bool] Test()
{
    $currentInstance = $this.Get()

    ## If this was configured with our mandatory property (RoleDefinitions), dig deeper
    if($currentInstance.RoleDefinitions)
    {
        if($currentInstance.EndpointName -ne $this.EndpointName)
        {
            Write-Verbose "EndpointName not equal: $($currentInstance.EndpointName)"
            return $false
        }

        ## Convert the RoleDefinitions string to the actual Hashtable
        $roleDefinitionsHash = $this.ConvertStringToHashtable($this.RoleDefinitions)
        Write-Verbose ($currentInstance.RoleDefinitions.GetType())

        if(-not $this.ComplexObjectsEqual($this.ConvertStringToHashtable($currentInstance.RoleDefinitions), $roleDefinitionsHash))
        {
            Write-Verbose "RoleDfinitions not equal: $($currentInstance.RoleDefinitions)"
            return $false
        }

        if(-not $this.ComplexObjectsEqual($currentInstance.RunAsVirtualAccountGroups, $this.RunAsVirtualAccountGroups))
        {
            Write-Verbose "RunAsVirtualAccountGroups not equal: $(ConvertTo-Json $currentInstance.RunAsVirtualAccountGroups -Depth 100)"
            return $false
        }

        if($currentInstance.GroupManagedServiceAccount -ne ($this.GroupManagedServiceAccount -replace '\$$', ''))
        {
            Write-Verbose "GroupManagedServiceAccount not equal: $($currentInstance.GroupManagedServiceAccount)"
            return $false
        }

        if($currentInstance.TranscriptDirectory -ne $this.TranscriptDirectory)
        {
            Write-Verbose "TranscriptDirectory not equal: $($currentInstance.TranscriptDirectory)"
            return $false
        }

        if(-not $this.ComplexObjectsEqual($currentInstance.ScriptsToProcess, $this.ScriptsToProcess))
        {
            Write-Verbose "ScriptsToProcess not equal: $(ConvertTo-Json $currentInstance.ScriptsToProcess -Depth 100)"
            return $false
        }

        if($currentInstance.MountUserDrive -ne $this.MountUserDrive)
        {
            Write-Verbose "MountUserDrive not equal: $($currentInstance.MountUserDrive)"
            return $false
        }

        if($currentInstance.UserDriveMaximumSize -ne $this.UserDriveMaximumSize)
        {
            Write-Verbose "UserDriveMaximumSize not equal: $($currentInstance.UserDriveMaximumSize)"
            return $false
        }
        # Check for null required groups

        $requiredGroupsHash = $this.ConvertStringToHashtable($this.RequiredGroups)
        if(-not $this.ComplexObjectsEqual($this.ConvertStringToHashtable($currentInstance.RequiredGroups), $requiredGroupsHash))
        {
            Write-Verbose "RequiredGroups not equal: $(ConvertTo-Json $currentInstance.RequiredGroups -Depth 100)"
            return $false
        }



        return $true
    }
    else
    {
        return $false
    }
}

## A simple comparison for complex objects used in JEA configurations.
## We don't need anything extensive, as we should be the only ones changing
## them.
hidden [bool] ComplexObjectsEqual($object1, $object2)
{
    $json1 = ConvertTo-Json -InputObject $object1 -Depth 100
    Write-Verbose "Argument1: $json1"

    $json2 = ConvertTo-Json -InputObject $object2 -Depth 100
    Write-Verbose "Argument2: $json2"

    return ($json1 -eq $json2)
}

## Convert a string representing a Hashtable into a Hashtable
hidden [Hashtable] ConvertStringToHashtable($hashtableAsString)
{
    if ($hashtableAsString -eq $null)
    {
        $hashtableAsString = '@{}'
    }
    $ast = [System.Management.Automation.Language.Parser]::ParseInput($hashtableAsString, [ref] $null, [ref] $null)
    $data = $ast.Find( { $args[0] -is [System.Management.Automation.Language.HashtableAst] }, $false )

    return [Hashtable] $data.SafeGetValue()
}

# Gets the resource's current state.
[JeaEndpoint] Get()
{
    $returnObject = New-Object JeaEndpoint

    $sessionConfiguration = $null

    [int]$tryCount = 0
    while (((Get-Service -Name WinRM).Status -ine 'Running') -and ($tryCount -le 10))
    {
        Write-Verbose -Message 'Starting WinRM service'
        Start-Service -Name WinRM
        Start-Sleep -Seconds 1
    }

    $winRMService = Get-Service -Name WinRM
    if (($winRMService -ne $null) -and ($winRMService.Status -ieq 'running')) {
        #This code will fail if winrm not running
        $sessionConfiguration = Get-PSSessionConfiguration -Name ($this.EndpointName + "*") |
            Where-Object Name -eq $this.EndpointName
    }

    if((-not $sessionConfiguration) -or (-not $sessionConfiguration.ConfigFilePath))
    {
        return $returnObject
    }
    else
    {
        $configFileArguments = Import-PowerShellDataFile $sessionConfiguration.ConfigFilePath
        $rawConfigFileAst = [System.Management.Automation.Language.Parser]::ParseFile($sessionConfiguration.ConfigFilePath, [ref] $null, [ref] $null)
        $rawConfigFileArguments = $rawConfigFileAst.Find( { $args[0] -is [System.Management.Automation.Language.HashtableAst] }, $false )

        $returnObject.EndpointName = $sessionConfiguration.Name

        ## Convert the hashtable to a string, as that is the input format required by DSC
        $returnObject.RoleDefinitions = $rawConfigFileArguments.KeyValuePairs | Where-Object { $_.Item1.Extent.Text -eq 'RoleDefinitions' } | ForEach-Object { $_.Item2.Extent.Text }

        if($sessionConfiguration.RunAsVirtualAccountGroups)
        {
            $returnObject.RunAsVirtualAccountGroups = $sessionConfiguration.RunAsVirtualAccountGroups -split ';'
        }

        if($sessionConfiguration.GroupManagedServiceAccount)
        {
            $returnObject.GroupManagedServiceAccount = $sessionConfiguration.GroupManagedServiceAccount
        }

        if($configFileArguments.TranscriptDirectory)
        {
            $returnObject.TranscriptDirectory = $configFileArguments.TranscriptDirectory
        }

        if($configFileArguments.ScriptsToProcess)
        {
            $returnObject.ScriptsToProcess = $configFileArguments.ScriptsToProcess
        }

        if($configFileArguments.MountUserDrive)
        {
            $returnObject.MountUserDrive = $configFileArguments.MountUserDrive
        }

        if($configFileArguments.UserDriveMaximumSize)
        {
            $returnObject.UserDriveMaximumSize = $configFileArguments.UserDriveMaximumSize
        }

        if($configFileArguments.RequiredGroups)
        {
            $returnObject.RequiredGroups = $rawConfigFileArguments.KeyValuePairs | Where-Object { $_.Item1.Extent.Text -eq 'RequiredGroups' } | ForEach-Object { $_.Item2.Extent.Text }
        }

        return $returnObject
    }
}

}

@jnury
Copy link
Contributor Author

jnury commented Feb 25, 2018

Hi all,
This is my proposal of a workaround for this bug: https://github.com/jnury/JEA/blob/issue%2330/DSC%20Resource/JustEnoughAdministration/JustEnoughAdministration.psm1

As I've done a 'lot' of refactoring and would appreciate a code review before filling a PR ;-)

This is what I've done:

  • implementing proposal from @djwork (with some small corrections)
  • adding restart of services that share the same process as WinRM
  • adding WinRM status verification before each call to xxx-PSSessionConfiguration
  • improving Verbose messages

@PaulHigin
Copy link

@manojampalam
It is a shame you have to do this. It would be better for the WinRM service to restart rather than hang on stopping. It might be due to the service host process not being able to restart for some reason. If this is the case then you should be able to ensure WinRM always runs in its own process. I am not familiar with WinRM, but Manoj may be able to help.

@manojampalam
Copy link

We are looking into this now and will follow up.

This was referenced May 7, 2018
@jnury
Copy link
Contributor Author

jnury commented May 10, 2018

Hello @manojampalam, any news ?

The workaround shipped in PR #46 is really 'heavy' ... but it was triggered on half of my last deployments, so it's really useful.

Hope you can fix the WinRM restart problem directly in the WinRM service and we can remove the workaround some day.

If it helps: on some of my hosts, it seems that the LanmanWorkstation service (which is co-hosted in the same process as WinRM) ended on an error while WinRM restarted after Register-PSSessionConfiguration.

@jnury
Copy link
Contributor Author

jnury commented Jun 8, 2018

Hello guys
@rpsqrd : have-you been able to have a look at PR #46 ?
@manojampalam : have you find something in WinRM ?

@cmyu-gh
Copy link

cmyu-gh commented Jun 19, 2018

Hi jnury, I am Chenming YU, who works with Manoj for WinRM area in Microsoft. Based on the symptom of you situation, I suspect it is similar with one case in past, which the pending action of winrm hang in dsctimer wakeup activities. (hosted inside winrm service).
if so, here is The workaround is to call “start-dsc*” with : (either)
1) –force (to make sure that any deadlock between WINRM and WMI breaks by cancelling existing operation).
2) Perform the operation using Dcom protocol instead of Winrm protocol to avoid getting errors when WINRM is transitioning between start-stop-start state.

To checked whether wakeup of dsctimer is activated within WinRM: -

  1. whether the regkey (below) value : 1 or not
    HKLM\SOFTWARE\Microsoft\Windows\CurrentVersion\Policies\System:DSCAutomationHostEnabled
  2. listed files existed under %windir%\system32\Configuration :
    "MetaConfig.mof",
    "Pending.mof",

if still repro with those workaround, please chat me with the memory dump of the pending winrm service (svchost.exe, via Taskmgr >> marked process >> "create dump file")

@jnury
Copy link
Contributor Author

jnury commented Oct 16, 2018

Hi @cmyu-gh, it seems I missed your answer, sorry for that.

I'm not able to use Start-DSC* with -Force as I use the Pull mode, so the configuration is automatically triggered.

Will the second option apply to Pull mode as well or is it only for the Start-DSC* commands ?

@cmyu-gh
Copy link

cmyu-gh commented Oct 22, 2018

sorry about the late response, in your situation, can You check the below regvalue on the repro machines, : HKLM\SOFTWARE\Microsoft\Windows\CurrentVersion\Policies\System:DSCAutomationHostEnabled

if it is '1' or ('2' and %windir%\system32\Configuration*mof existed), then it might prove my suspect on dsctimer plugin of WinRM. otherwise, can you forward me the repro dump of winrm service in hang for advance analysis.

in case of pull control on your case, there is a workaround in directly stop dsctimer plugin within winrm (the effect needs steprestart-service winrm)

HKLM\SOFTWARE\Microsoft\Windows\CurrentVersion\Policies\System:DSCAutomationHostEnabled set it to 0

  • after restart winrm service, retry your execution.

**please ignore 2nd option posted before, it is force cimsession protocol via DCOM instead of WinRM -- switch set in some management cmdlets.

@djwork
Copy link

djwork commented May 6, 2020

Note, this is still an issue on PowerShell 5.1 on Windows 2019
and the workaround of setting HKLM\SOFTWARE\Microsoft\Windows\CurrentVersion\Policies\System:DSCAutomationHostEnabled to 0 seems to cause problems continuing to apply DSC settings after a reboot.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
Projects
None yet
Development

No branches or pull requests

6 participants