Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

xService Cannot Manage Failure Actions #678

Open
RandomNoun7 opened this issue Mar 27, 2020 · 15 comments · May be fixed by #679
Open

xService Cannot Manage Failure Actions #678

RandomNoun7 opened this issue Mar 27, 2020 · 15 comments · May be fixed by #679
Labels
enhancement The issue is an enhancement request. in progress The issue is being actively worked on by someone.

Comments

@RandomNoun7
Copy link

The xService resource cannot currently manage a service's Failure Actions.

The traditional way to set these actions is via sc.exe.

I propose that xService be modified to read the REG_BINARY field FailureActions registry value for the service.

The contents of the field are described in this Stack Overflow Post and Documented by Microsft on the MS Docs Site

I have a start to this functionality working that allows me to read the values from the registry and to compare against a desired state.

The next steps will be to implement settings the values in the registry, and then refactoring the code to make it cleaner and more maintainable.

The new proposed properties for the resource are:

ResetPeriodSeconds - The time after which to reset the failure count to zero if there are no failures, in seconds.
FailureCommand - The command line of the process for the CreateProcess function to execute in response to the SC_ACTION_RUN_COMMAND service controller action.
RebootMessage - The message to be broadcast to server users before rebooting in response to the SC_ACTION_REBOOT service controller action.
Failure1Action
Failure1Delay
Failure2Action
Failure2Delay
Failure3Action
Failure3Delay

The FailureAction represent 3 possible slots for failure actions.
The first two define what to do after first and second failure and the delay in milliseconds to wait before performing the action

The third slot is the action to be performed on subsequent failures that occur within the ResetPeriodSeconds time period.

Valid actions are documented on the MSDocs Site as:
NONE
REBOOT
RESTART
RUN_COMMAND

This issue is a restatement of PowerShell/PSDscResources#83 and PowerShell/PSDscResources#69

@RandomNoun7 RandomNoun7 linked a pull request Mar 27, 2020 that will close this issue
9 tasks
@RandomNoun7
Copy link
Author

So far in the course of testing, I have discovered two new things that will have to be accounted for.

The Recovery Option tab of the services properties dialog can only show you 3 recovery options, but in fact you can have a larger number of them stored in the registry. I describe the behavior of the recovery options taking into account the larger number of allowed options in this comment on the PR.

I have also found that there is another property I need to manage called FailureActionsOnNonCrashFailures which corresponds to the checkbox in the Recovery Options tab labeled Enable Actions For Stops With Errors. I describe the behavior in more detail in this comment on the PR.

Dealing with the extra registry key for that checkbox is the easier of the two issues.

To account for the possibility of an unkown number of failure actions that may need to be managed, I think I will have to change strategy slightly and remove the 6 numbered failure action parameters, and switch to a single failureActionsCollection parameter. This new parameter would take an array of hashtables. Each hash table would be required to have two keys action and delaySeconds.

I could then iterate over the collection and ensure that each is properly reflected in the registry key data.

@PlagueHO PlagueHO added enhancement The issue is an enhancement request. help wanted The issue is up for grabs for anyone in the community. labels Apr 7, 2020
@PlagueHO
Copy link
Member

PlagueHO commented Apr 7, 2020

Thanks for working on this @RandomNoun7 - it is much appreciated. I have been watching your draft PR with interest 😁

@RandomNoun7
Copy link
Author

Well, annoyingly, and as hard as this may be to believe, I am only just now running into the fact that DSC really can't deal with hashtables on their own. Now I need to decide if I want to deal with [Microsoft.Management.Infrastructure.CimInstance[]] in all of my code, deal with converting back and forth all the time, or just go back to only supporting 3 slots for failure actions so I don't have to deal with arrays at all.

That last option wuold certainly make my life easier, but I'm not sure I can bring myself to do it because I think there are instances with fresh Windows installs where there are already more than 3 failure actions stored in the registry, and not supporting that configuration in a new resource parameter and saying "If you want to use this you will have to accept data loss" is just not great.

@PlagueHO
Copy link
Member

Hi @RandomNoun7 - yes, this is a side affect of being dependent on CIM for the MOF. There are two patterns available to solve this:

  1. Use a Embedded Instances (like how xWebSite does it): https://github.com/dsccommunity/xWebAdministration/blob/master/source/DSCResources/MSFT_xWebSite/MSFT_xWebSite.schema.mof#L10
  2. Use a sub resource (e.g. create a new resource called xServiceFailureAction). @rchaganti wrote a good article on this over on PowerShell Magazine here: https://www.powershellmagazine.com/2017/05/23/psdsc-doing-it-right-resource-granularity/

Personally, I feel the 2nd option is the way to go here. I'm certain @gaelcolas also has thoughts on this as well.

@RandomNoun7
Copy link
Author

RandomNoun7 commented Apr 21, 2020

I'm not sure the second option would help me avoid the need for an array? In order to construct the registry key properly I have to know all of the failure actions that will go into the key, their properties, and the order they go in, and whether there is a restart command or a reboot message, all at the same time.

If I have it right, the second option would be to create a resource called xServiceFailureAction that would represent a single failure action like Restart the service after 100 seconds, and the service the action belongs to, and then the user would write a configuration that would declare some number of this resource and assign them to a specified service.

Unfortunately, I don't see how any one instance of that resource would have the information it needs to properly construct the registry key.

Given that this the first time I'm implementing a feature of this kind in DSC please do let me know if I have that wrong.

@RandomNoun7
Copy link
Author

RandomNoun7 commented Apr 27, 2020

I'v hit a bit of speedbump managing the check box for Enable actions for stops with errors. Any thoughts or ideas are welcome.
https://stackoverflow.com/questions/61463543/setting-a-windows-services-enable-actions-for-stops-with-errors-check-box-via

@RandomNoun7
Copy link
Author

I have found someone that wrote a PInvoke Implementation that seems to manage this flag. I want to test it and if it works and the author consents, to largely change strategy and go with PInvoke for this instead.
https://gist.github.com/jborean93/889288b56087a2c5def7fa49b6a8a0ad

@PlagueHO
Copy link
Member

Hi @RandomNoun7 - I think it is OK to use PInvoke method if no other alternative is available (I've used it in CertificateDsc). It is a little bit more difficult to create unit tests for though (it is possible but our CI pipelines don't know to run tests against C#) - so we end up having to rely on integration tests (which is OK if unavoidable).

It seems strange that there is no other way of doing this - seems to be a bit of an oversight somewhwere. I did a bit of searching but couldn't pull anything up - I sort of expected it to be exposed in CIM/WMI.

It appears that the PInvoke code is doing some escalation of privileges (could be wrong). That may not play well with the DSC LCM (or it might be just fine). Just something to be aware of.

@PlagueHO
Copy link
Member

One thing I'd suggest (that I should have done in CertificateDsc) is put the C# code in a separate .cs file and import that in the Add-Type rather than embedding the code. That way we can potentially run it through linting and other tools.

@RandomNoun7
Copy link
Author

I like the idea of separating out the code file. If I could find another method besides PInvoke I would definitely use it. I'm still searching to figure out how, if not that registry key, this setting is being persisted so I could just manage that, but searching online and a lot of time in PROCMON aren't turning it up. It really seems like the whole idea of this service failure actions business is a half baked afterthough on Microsofts part and I wish I could talk to someone to understand why and what on earth is going on with this feature.

@PlagueHO
Copy link
Member

It does seem odd that there isn't a straight forward/easy way of doing this. I'll ask around internally to see if I can find anyone who might be in the know here. But failing that we may have to go the PInvoke way...

@RandomNoun7
Copy link
Author

It occurs to me that one way to preserve the unit testability would be to use PInvoke solely for managing that one flag. I could even rip out everything in that code not needed for that specific purpose to make the .cs file itself a litle easier to understand.

I am curious to know that comes of your asking around internally though.

I should mention as well though that I think I was a little unkind in my criticism of how failure actions are implemented. As with most things in software development I'm certain that a lot of smart people were solving hard problems in the best way available given constraints unknown to use now.

@PlagueHO PlagueHO added in progress The issue is being actively worked on by someone. and removed help wanted The issue is up for grabs for anyone in the community. labels Apr 29, 2020
@PlagueHO
Copy link
Member

I was wondering: could you call SC.EXE to do this flag?
To Set:

& sc.exe @('qfailureflag', $ServiceName, 1) # True
& sc.exe @('qfailureflag', $ServiceName, 0) # False

To Get:

& sc.exe @('qfailureflag', $ServiceName)

Now, to find the right internal community to ask :) hehe

But I completely agree - this may just have been something that slipped under the radar and/or just wasn't high enough demand (e.g. SC.EXE can handle it). I've occasionally come across small gaps like that and wondered how they slipped through. I'm thinking that in a lot of cases GPO's handled them OK so it was just left as is. Another example is configuring a Network Proxy settings - the only way I could do that was encoding/decoding binary strings and dumping them into the registry (not unlike what you're doing here). Usually the hardest part is not the writing of the setting it is actually reading it.

@RandomNoun7
Copy link
Author

I was about to reply that the oversight extends to sc.exe itself, and that it doesn't work either, but then I took a really close look at your code snippets up there.

All of the documentation I had seen up to this point had led me to believe that the syntax for setting this flag was literally something like & sc.exe failureflag WSearch flag=1. I see now above though that you aren't passing flag=1 but just 1. I tested that change and it worked exactly as expected.

Much my furstration up to this point has partially been due to the fact that I believed sc.exe was also not working properly, so it's not fun to discover that I'v just been using it incorrectly like this, but there it is. It's doubly frustrating that sc has been reporting success and exiting zero this whole time despite the fact that I'v been using incorrect syntax.

So, with that said, absolutely yes, we can invoke sc for this one thing and that would probably be better than PInvoke for that single purpose. I would still be in favor of the registry approach for the other stuff just so I don't have to be in the business of parsing sc's strings to do the rest of it.

One thing that I'll have to keep in mind is that sc.exe is now known to report success and zero exit code when it hasn't actually done anything, so immediately after using it I'll have to query the registry real quick to make sure the setting actually took, but that seems to be reliable at this point.

@RandomNoun7
Copy link
Author

RandomNoun7 commented Apr 29, 2020

I would still love to know what sc is doing besides just setting that registry key, but this at least gives me a path forward to get the job done.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement The issue is an enhancement request. in progress The issue is being actively worked on by someone.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants