Refactor exit flow to limit forceful termination #942

kozhevnikov · 2017-10-04T13:19:06Z

Implements #889

charlierudolph

The issue had discussed using a CLI option and I would still prefer that. We could use it in this file with something like ‘cli.shouldDrainEventLoop()’ to use process.exitCode and otherwise use the existing logic

kozhevnikov · 2017-10-04T14:56:49Z

It still forcefully terminates if anything hangs but now allows for few seconds (same as default timeout) to exist gracefully by itself (unref makes event loop ignore the pending timeout). If it's still running after 5 seconds the process is terminated just as before.

charlierudolph · 2017-10-04T15:04:38Z

I don't like the 5 seconds delay for those that don't need it though. I would prefer as discussed in the issue as it retains current speedy behavior and allows for waiting for the event loop to close. The 5 second delay is a weird middle ground to me

kozhevnikov · 2017-10-04T15:22:23Z

The delay is only for when bad things have happened, something have hung and somehow blocked cucumber/node from exiting on its own. The problem with having a binary choice between exit or drain is that in normal day to day operation I need a second for flush to be completed after cucumber have exited but node still have something in the event loop (e.g. file logger hasn't finished writing to file and missing important debug messages). If everything is fine then there's no delay, cucumber exists, buffer is flushed momentarily, node exists, all done, timeout is never triggered and discarded thanks to unref. If something unexpected does happen and node is stuck using drain argument would result is stuck CI builds. Having this failsafe makes cucumber behave the same as before in 99.999% cases while allowing for error to be logged and written and the node process still terminated forcefully in case where event loop cannot be drained by itself gracefully. It's not a middle ground but rather all encompassing fix.

charlierudolph · 2017-10-04T15:34:28Z

file logger hasn't finished writing to file and missing important debug messages

Why is that handled async outside of cucumber? Could this be updated so cucumber can wait for it to finish or in a separate process so its not reliant on when cucumber exits?

It's not a middle ground but rather all encompassing fix.

I could see this delay not being enough for someone and thus having them ask for it to be bumped. Making the delay user configurable would be a better solution than the hardcoded 5 seconds but the use case you described makes me think this can be solved in a more reliable way

kozhevnikov · 2017-10-04T16:06:36Z

I agree, 5 seconds is completely arbitrary, I used the default cucumber step timeout value. Whoever would want to increase it later on can make it configurable and submit a new PR. I my case 100ms is enough to flush file buffers but since it doesn't actually wait for the timeout to fire (unless it's hung) it makes no difference for performance if it's 100ms or 5s.

It's normal for logging libraries (i.e. winston, bunyan, etc). Calling write queues up message event for logging and multiple transport plugins process it, i.e. stdout transport is sync while file transport is async and any logging SaaS transport is async and slow but those typically have their own dedicated process that manages the queue and retries and delays of remote calls.

charlierudolph · 2017-10-13T03:56:15Z

For the use case of logging libraries, I expect them to have a close or flush that you can add to a hook so that cucumber waits for them to finish. I really don't like adding waits to alleviate a timing issue instead of preventing the timing issue.

kozhevnikov · 2017-10-16T11:07:02Z

I'd love to fix the timing issue but I don't know how to reproduce it. You said it sometimes happens with UI browser tests so would have to be fixed in WebDriverJS anyway?

I feel like we might be on different pages so I'll try to clarify my point of view.

Requirement 1
Cucumber.js and any other framework/library should not be calling process.exit() explicitly as it abruptly terminates Node and prevents any pending cleanup tasks. It should instead set process.exitCode to indicate success or failure and let Node exist gracefully on its own as per Node recommendation.

To implement this requirement we need to restructure conditional flow of run.js so it's not relying on exit() termination but rather sets exitCode and lets run.js flow through to the end.

However you say there are exceptional circumstances where Node event loop cannot be drained properly and needs abrupt termination.

Requirement 2
Solution should be backwards compatible and terminate Node in exceptional circumstances when event loop cannot be drained by itself.

Your suggestion to have a flag that either calls exit() or sets exitCode would break both of the requirements. If flag is not used then Node is terminated early and we're back to not satisfying req 1. If the flag is used then your exceptional scenario that block event loop would result in stuck test runs and infinite build jobs making Cucumber.js unusable.

We need to account for both, the recommended safe drain of event loop as well as your hung browser tests that do require forceful termination.

It should not be a blind guess of the person running the tests to use the flag or not but of the framework itself on when to let the event loop drain and when to terminate abruptly but only if necessary.

Requirement 3
Solution should not affect performance and slow down the test run.

All three of the requirements are satisfied by the PR. Under normal conditions the even loop is allowed to drain as per recommendations. In exceptional circumstances Node is terminated. The performance is exactly as it was before since the safety timeout is unreferenced therefor not blocking event loop from draining as early as it possibly can and exiting Node safely and as quickly as it does now.

charlierudolph · 2017-10-16T14:00:33Z

I consider requirement 1 and 2 to be competing with each other. The flag to switch between exit and exitCode is so that we can satisfy both requirements at different times. I'd prefer we do something similar to mocha which introduced a --no-exit flag which set the exit code and let the loop drain. In the next major version they made that the default, and required you to opt in to exiting immediately.

I consider this PR the step after the --no-exit flag, if we find that its not good enough. I prefer we try what I believe is a simpler solution first (and one used in another project bigger than this), as I don't think the delayed exit is needed.

charlierudolph · 2017-10-16T14:14:16Z

See mochajs/mocha#1855 for an interesting idea of an extension to the --no-exit/--exit flag where the user wants it to error if the event loop doesn't drain. I think we need the simple solution of drain the event loop or not first and then can revisit the delay down the line as needed.

lock · 2018-11-16T15:52:23Z

This thread has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue for related bugs.

Refactor exit flow to limit forceful termination

c68afaa

charlierudolph reviewed Oct 4, 2017

View reviewed changes

kozhevnikov closed this Nov 16, 2017

kozhevnikov deleted the process-exit branch November 16, 2017 14:53

lock bot locked as resolved and limited conversation to collaborators Nov 16, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Refactor exit flow to limit forceful termination #942

Refactor exit flow to limit forceful termination #942

kozhevnikov commented Oct 4, 2017

charlierudolph left a comment

kozhevnikov commented Oct 4, 2017

charlierudolph commented Oct 4, 2017

kozhevnikov commented Oct 4, 2017

charlierudolph commented Oct 4, 2017

kozhevnikov commented Oct 4, 2017

charlierudolph commented Oct 13, 2017

kozhevnikov commented Oct 16, 2017 •

edited

charlierudolph commented Oct 16, 2017 •

edited

charlierudolph commented Oct 16, 2017

lock bot commented Nov 16, 2018

Refactor exit flow to limit forceful termination #942

Refactor exit flow to limit forceful termination #942

Conversation

kozhevnikov commented Oct 4, 2017

charlierudolph left a comment

Choose a reason for hiding this comment

kozhevnikov commented Oct 4, 2017

charlierudolph commented Oct 4, 2017

kozhevnikov commented Oct 4, 2017

charlierudolph commented Oct 4, 2017

kozhevnikov commented Oct 4, 2017

charlierudolph commented Oct 13, 2017

kozhevnikov commented Oct 16, 2017 • edited

charlierudolph commented Oct 16, 2017 • edited

charlierudolph commented Oct 16, 2017

lock bot commented Nov 16, 2018

kozhevnikov commented Oct 16, 2017 •

edited

charlierudolph commented Oct 16, 2017 •

edited