Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

reproduce worker termination race condition #73

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

e-shubin
Copy link

There is a race condition between checking out of a worker from the pool and worker process's termination. In test case provided worker is terminated via gen_server:cast for more reliable reproducing, but it also occurs on gen_server:call.
Is it a know issue? Is there any recommendation of how to avoid it?

@devinus
Copy link
Owner

devinus commented Apr 24, 2015

@e-shubin Can you explain what you expect the behavior to be?

@eshubin
Copy link

eshubin commented Apr 27, 2015

@devinus I would like poolboy not to checkout a "dead" worker. In first assert I checkout a woker and execute an operation, which makes it to terminate. Then I try to do another operation, but terminated process is checked out. For me it seems to be a race condition.

@fishcakez
Copy link
Contributor

It is impossible to guarantee a "dead" worker is not checked out because a worker could die after being checked out but before the caller has handled the reply with the worker from poolboy. If you want to try to ensure a "dead" worker is not checked out the worker must exit and not be checked in (or checked in after it exits in the case of a transaction).

Therefore the race condition is in the test, and not in poolboy. A cast is sent to the worker telling it to die but the client checks in in the worker without waiting for the worker to exit. This means the next client might get a worker that is about to die.

@devinus
Copy link
Owner

devinus commented Apr 29, 2015

I've been thinking about this issue for the past few days and also agree with @fishcakez. You're telling the worker to die, and the pool as no idea the worker is dead yet because the cleanup facilities haven't been triggered in the pool yet. It's not until after the worker is checked out again that the worker is able to shutdown properly. @fishcakez Do you know of any reliable way to test that a worker is not being shutdown before checkout? @e-shubin I would handle this case by checking in the worker yourself after you know it's shutdown.

@fishcakez
Copy link
Contributor

@devinus do you mean something like is_process_alive/1? I am unsure if it is worth doing that because the race condition would still exist.

@devinus
Copy link
Owner

devinus commented Apr 29, 2015

@fishcakez No I was thinking of something like checking the processes state or something weird like that.

@fishcakez
Copy link
Contributor

That does check its alive/exited state? Or do you mean like a sys call? I think it would only be possible to do this nicely if workers, rather than clients, checked the worker in. But that would greatly take away from poolboys simplicity.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants