reproduce worker termination race condition #73

e-shubin · 2015-04-20T16:08:16Z

There is a race condition between checking out of a worker from the pool and worker process's termination. In test case provided worker is terminated via gen_server:cast for more reliable reproducing, but it also occurs on gen_server:call.
Is it a know issue? Is there any recommendation of how to avoid it?

devinus · 2015-04-24T01:26:51Z

@e-shubin Can you explain what you expect the behavior to be?

eshubin · 2015-04-27T08:29:11Z

@devinus I would like poolboy not to checkout a "dead" worker. In first assert I checkout a woker and execute an operation, which makes it to terminate. Then I try to do another operation, but terminated process is checked out. For me it seems to be a race condition.

fishcakez · 2015-04-29T21:00:22Z

It is impossible to guarantee a "dead" worker is not checked out because a worker could die after being checked out but before the caller has handled the reply with the worker from poolboy. If you want to try to ensure a "dead" worker is not checked out the worker must exit and not be checked in (or checked in after it exits in the case of a transaction).

Therefore the race condition is in the test, and not in poolboy. A cast is sent to the worker telling it to die but the client checks in in the worker without waiting for the worker to exit. This means the next client might get a worker that is about to die.

devinus · 2015-04-29T21:37:46Z

I've been thinking about this issue for the past few days and also agree with @fishcakez. You're telling the worker to die, and the pool as no idea the worker is dead yet because the cleanup facilities haven't been triggered in the pool yet. It's not until after the worker is checked out again that the worker is able to shutdown properly. @fishcakez Do you know of any reliable way to test that a worker is not being shutdown before checkout? @e-shubin I would handle this case by checking in the worker yourself after you know it's shutdown.

fishcakez · 2015-04-29T21:58:52Z

@devinus do you mean something like is_process_alive/1? I am unsure if it is worth doing that because the race condition would still exist.

devinus · 2015-04-29T22:15:05Z

@fishcakez No I was thinking of something like checking the processes state or something weird like that.

fishcakez · 2015-04-29T22:44:29Z

That does check its alive/exited state? Or do you mean like a sys call? I think it would only be possible to do this nicely if workers, rather than clients, checked the worker in. But that would greatly take away from poolboys simplicity.

reproduce worker termination race condition

c36df9a

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

reproduce worker termination race condition #73

reproduce worker termination race condition #73

e-shubin commented Apr 20, 2015

devinus commented Apr 24, 2015

eshubin commented Apr 27, 2015

fishcakez commented Apr 29, 2015

devinus commented Apr 29, 2015

fishcakez commented Apr 29, 2015

devinus commented Apr 29, 2015

fishcakez commented Apr 29, 2015

reproduce worker termination race condition #73

Are you sure you want to change the base?

reproduce worker termination race condition #73

Conversation

e-shubin commented Apr 20, 2015

devinus commented Apr 24, 2015

eshubin commented Apr 27, 2015

fishcakez commented Apr 29, 2015

devinus commented Apr 29, 2015

fishcakez commented Apr 29, 2015

devinus commented Apr 29, 2015

fishcakez commented Apr 29, 2015