Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Refactor exit flow to limit forceful termination #942

Closed
wants to merge 1 commit into from
Closed

Refactor exit flow to limit forceful termination #942

wants to merge 1 commit into from

Conversation

kozhevnikov
Copy link
Member

Implements #889

Copy link
Member

@charlierudolph charlierudolph left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The issue had discussed using a CLI option and I would still prefer that. We could use it in this file with something like ‘cli.shouldDrainEventLoop()’ to use process.exitCode and otherwise use the existing logic

@kozhevnikov
Copy link
Member Author

It still forcefully terminates if anything hangs but now allows for few seconds (same as default timeout) to exist gracefully by itself (unref makes event loop ignore the pending timeout). If it's still running after 5 seconds the process is terminated just as before.

@charlierudolph
Copy link
Member

I don't like the 5 seconds delay for those that don't need it though. I would prefer as discussed in the issue as it retains current speedy behavior and allows for waiting for the event loop to close. The 5 second delay is a weird middle ground to me

@kozhevnikov
Copy link
Member Author

The delay is only for when bad things have happened, something have hung and somehow blocked cucumber/node from exiting on its own. The problem with having a binary choice between exit or drain is that in normal day to day operation I need a second for flush to be completed after cucumber have exited but node still have something in the event loop (e.g. file logger hasn't finished writing to file and missing important debug messages). If everything is fine then there's no delay, cucumber exists, buffer is flushed momentarily, node exists, all done, timeout is never triggered and discarded thanks to unref. If something unexpected does happen and node is stuck using drain argument would result is stuck CI builds. Having this failsafe makes cucumber behave the same as before in 99.999% cases while allowing for error to be logged and written and the node process still terminated forcefully in case where event loop cannot be drained by itself gracefully. It's not a middle ground but rather all encompassing fix.

@charlierudolph
Copy link
Member

file logger hasn't finished writing to file and missing important debug messages

Why is that handled async outside of cucumber? Could this be updated so cucumber can wait for it to finish or in a separate process so its not reliant on when cucumber exits?

It's not a middle ground but rather all encompassing fix.

I could see this delay not being enough for someone and thus having them ask for it to be bumped. Making the delay user configurable would be a better solution than the hardcoded 5 seconds but the use case you described makes me think this can be solved in a more reliable way

@kozhevnikov
Copy link
Member Author

I agree, 5 seconds is completely arbitrary, I used the default cucumber step timeout value. Whoever would want to increase it later on can make it configurable and submit a new PR. I my case 100ms is enough to flush file buffers but since it doesn't actually wait for the timeout to fire (unless it's hung) it makes no difference for performance if it's 100ms or 5s.

It's normal for logging libraries (i.e. winston, bunyan, etc). Calling write queues up message event for logging and multiple transport plugins process it, i.e. stdout transport is sync while file transport is async and any logging SaaS transport is async and slow but those typically have their own dedicated process that manages the queue and retries and delays of remote calls.

@charlierudolph
Copy link
Member

For the use case of logging libraries, I expect them to have a close or flush that you can add to a hook so that cucumber waits for them to finish. I really don't like adding waits to alleviate a timing issue instead of preventing the timing issue.

@kozhevnikov
Copy link
Member Author

kozhevnikov commented Oct 16, 2017

I'd love to fix the timing issue but I don't know how to reproduce it. You said it sometimes happens with UI browser tests so would have to be fixed in WebDriverJS anyway?

I feel like we might be on different pages so I'll try to clarify my point of view.

Requirement 1
Cucumber.js and any other framework/library should not be calling process.exit() explicitly as it abruptly terminates Node and prevents any pending cleanup tasks. It should instead set process.exitCode to indicate success or failure and let Node exist gracefully on its own as per Node recommendation.

To implement this requirement we need to restructure conditional flow of run.js so it's not relying on exit() termination but rather sets exitCode and lets run.js flow through to the end.

However you say there are exceptional circumstances where Node event loop cannot be drained properly and needs abrupt termination.

Requirement 2
Solution should be backwards compatible and terminate Node in exceptional circumstances when event loop cannot be drained by itself.

Your suggestion to have a flag that either calls exit() or sets exitCode would break both of the requirements. If flag is not used then Node is terminated early and we're back to not satisfying req 1. If the flag is used then your exceptional scenario that block event loop would result in stuck test runs and infinite build jobs making Cucumber.js unusable.

We need to account for both, the recommended safe drain of event loop as well as your hung browser tests that do require forceful termination.

It should not be a blind guess of the person running the tests to use the flag or not but of the framework itself on when to let the event loop drain and when to terminate abruptly but only if necessary.

Requirement 3
Solution should not affect performance and slow down the test run.

All three of the requirements are satisfied by the PR. Under normal conditions the even loop is allowed to drain as per recommendations. In exceptional circumstances Node is terminated. The performance is exactly as it was before since the safety timeout is unreferenced therefor not blocking event loop from draining as early as it possibly can and exiting Node safely and as quickly as it does now.

@charlierudolph
Copy link
Member

charlierudolph commented Oct 16, 2017

I consider requirement 1 and 2 to be competing with each other. The flag to switch between exit and exitCode is so that we can satisfy both requirements at different times. I'd prefer we do something similar to mocha which introduced a --no-exit flag which set the exit code and let the loop drain. In the next major version they made that the default, and required you to opt in to exiting immediately.

I consider this PR the step after the --no-exit flag, if we find that its not good enough. I prefer we try what I believe is a simpler solution first (and one used in another project bigger than this), as I don't think the delayed exit is needed.

@charlierudolph
Copy link
Member

See mochajs/mocha#1855 for an interesting idea of an extension to the --no-exit/--exit flag where the user wants it to error if the event loop doesn't drain. I think we need the simple solution of drain the event loop or not first and then can revisit the delay down the line as needed.

@kozhevnikov kozhevnikov deleted the process-exit branch November 16, 2017 14:53
@lock
Copy link

lock bot commented Nov 16, 2018

This thread has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue for related bugs.

@lock lock bot locked as resolved and limited conversation to collaborators Nov 16, 2018
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants