Hang in subprocess.Popen() #67

gvanrossum · 2018-03-22T22:36:40Z

[UPDATE: If you have this problem, the solution is to pass close_fds=True to Popen()]

I've got a use case where a process being traced for type collection uses subprocess.Popen() to execute some helper program, and the Popen() call hangs at the line

data = _eintr_retry_call(os.read, errpipe_read, 1048576)

in Popen._execute_child(). (This is Python 2.7 on Mac, i.e. POSIX.)

That pipe has FD_CLOEXEC, so the child is not hitting the exec(). Presumably this is because it hangs in a Queue.put() operation in _trace_dispatch() (e.g. here).

I can think of a gross fix that monkey-patches os.fork to disable the profiling hook around the fork() so the child doesn't do this. But perhaps there's a more elegant solution (without using os.register_at_fork(), which is Python 3.7+ only)? Or the tracing hook could check the pid?

[UPDATE:] I can't repro this in a small test program. But it's real, and the os.fork monkey-patch fixes it. Not sure what to do about it yet, the monkey-patch seems risky.

The text was updated successfully, but these errors were encountered:

xiaket · 2018-06-28T05:28:26Z

I think I've bump into this one today(with subprocess.check_output) and I'll see whether I can reproduce it with some test code.

xiaket · 2018-06-29T03:00:17Z

This is a weird case. This is how I (somehow) reproduced it:

The test code

# cat anno.py
import subprocess
import unittest

from pyannotate_runtime import collect_types
from schematics import Model
from schematics.types import StringType


class GitPackageSource(Model):
    uri = StringType(required=True)
    tag = StringType(serialize_when_none=False)
    branch = StringType(serialize_when_none=False)
    commit = StringType(serialize_when_none=False)


class SourceProcessor(object):
    def sleep(self):
        subprocess.check_output(["sleep", "1"])
        return "ok"


class TestUtil(unittest.TestCase):
    def test_SourceProcessor_helpers(self):
        sp = SourceProcessor()

        for i in [{"uri": "git@foo"}, {"tag": "foo"}, {"commit": "1234"}]:
            g = GitPackageSource(i)
        print("GitPackageSource instantiated")

        self.assertEqual(sp.sleep(), "ok")


if __name__ == "__main__":
    collect_types.init_types_collection()
    with collect_types.collect():
        unittest.main()

The setup process

{~}mkdir test && cd test
{~/test}virtualenv venv
New python executable in /Users/shi.kaix/test/venv/bin/python2.7
Also creating executable in /Users/shi.kaix/test/venv/bin/python
Installing setuptools, pip, wheel...done.
{~/test}source venv/bin/activate
∇{~/test}pip install pyannotate schematics
Successfully installed mypy-extensions-0.3.0 pyannotate-1.0.6 schematics-2.1.0 six-1.11.0 typing-3.6.4
∇{~/test}for i in `seq 1 10`; do gtimeout 5 python anno.py; echo $?; done

That is, create a new directory, init a virtualenv, install some dependencies and run the test code 10 times. I've observed an 80% chance that the test code will not finish within 5 seconds.

The output

A typical output could look like this:

GitPackageSource instantiated
124
GitPackageSource instantiated
124
GitPackageSource instantiated
124
GitPackageSource instantiated
124
GitPackageSource instantiated
124
GitPackageSource instantiated
.
----------------------------------------------------------------------
Ran 1 test in 1.005s

OK
0
GitPackageSource instantiated
124
GitPackageSource instantiated
124
GitPackageSource instantiated
124
GitPackageSource instantiated
.
----------------------------------------------------------------------
Ran 1 test in 1.134s

OK
0

So most of the time, the gtimeout command exit with 124, which signify that the python process has been killed.

My notes

The original command that I've used is git ls-remote git@github.com:cloudtools/stacker.git, which would give some output. It took around 1 second to finish on my end and the error rate is around 80%. I switched it to sleep 1 and it give nearly the same error rate.
You will of course note that there is a GitPackageSource instantiated message there in the output. We can see that happen all the time and it always finished successfully. However, if we remove this part:

        for i in [{"uri": "git@foo"}, {"tag": "foo"}, {"commit": "1234"}]:
            g = GitPackageSource(i)
        print("GitPackageSource instantiated")

The error rate would drop significantly to around 10%. If we replace GitPackageSource with UserDict, the error rate would be around 10%.

If we instead of sending SIGTERM to the python process and instruct gtimeout to send SIGINT to the process, we'll get:

∇{~/test}gtimeout -s SIGINT 5 python anno.py
Traceback (most recent call last):
  File "anno.py", line 36, in <module>
    unittest.main()
  File "/usr/local/Cellar/python@2/2.7.15_1/Frameworks/Python.framework/Versions/2.7/lib/python2.7/unittest/main.py", line 95, in __init__
    self.runTests()
  File "/usr/local/Cellar/python@2/2.7.15_1/Frameworks/Python.framework/Versions/2.7/lib/python2.7/unittest/main.py", line 232, in runTests
    self.result = testRunner.run(self.test)
  File "/usr/local/Cellar/python@2/2.7.15_1/Frameworks/Python.framework/Versions/2.7/lib/python2.7/unittest/runner.py", line 151, in run
    test(result)
  File "/usr/local/Cellar/python@2/2.7.15_1/Frameworks/Python.framework/Versions/2.7/lib/python2.7/unittest/suite.py", line 70, in __call__
    return self.run(*args, **kwds)
  File "/usr/local/Cellar/python@2/2.7.15_1/Frameworks/Python.framework/Versions/2.7/lib/python2.7/unittest/suite.py", line 108, in run
    test(result)
  File "/usr/local/Cellar/python@2/2.7.15_1/Frameworks/Python.framework/Versions/2.7/lib/python2.7/unittest/suite.py", line 70, in __call__
    return self.run(*args, **kwds)
  File "/usr/local/Cellar/python@2/2.7.15_1/Frameworks/Python.framework/Versions/2.7/lib/python2.7/unittest/suite.py", line 108, in run
    test(result)
  File "/usr/local/Cellar/python@2/2.7.15_1/Frameworks/Python.framework/Versions/2.7/lib/python2.7/unittest/case.py", line 393, in __call__
    return self.run(*args, **kwds)
  File "/usr/local/Cellar/python@2/2.7.15_1/Frameworks/Python.framework/Versions/2.7/lib/python2.7/unittest/case.py", line 329, in run
    testMethod()
  File "anno.py", line 30, in test_SourceProcessor_helpers
    self.assertEqual(sp.sleep(), "ok")
  File "anno.py", line 19, in sleep
    subprocess.check_output(["sleep", "1"])
  File "/usr/local/Cellar/python@2/2.7.15_1/Frameworks/Python.framework/Versions/2.7/lib/python2.7/subprocess.py", line 216, in check_output
    process = Popen(stdout=PIPE, *popenargs, **kwargs)
  File "/usr/local/Cellar/python@2/2.7.15_1/Frameworks/Python.framework/Versions/2.7/lib/python2.7/subprocess.py", line 394, in __init__
    errread, errwrite)
  File "/usr/local/Cellar/python@2/2.7.15_1/Frameworks/Python.framework/Versions/2.7/lib/python2.7/subprocess.py", line 1023, in _execute_child
    data = _eintr_retry_call(os.read, errpipe_read, 1048576)
  File "/usr/local/Cellar/python@2/2.7.15_1/Frameworks/Python.framework/Versions/2.7/lib/python2.7/subprocess.py", line 125, in _eintr_retry_call
    return func(*args)
KeyboardInterrupt

I guess this is what you have seen in your case?

ref

I don't think it's related but the software versions are as follows:

∇{~/test}python --version
Python 2.7.15
∇{~/test}uname -a
Darwin sbmelmac-07466.local 15.6.0 Darwin Kernel Version 15.6.0: Wed May  2 21:04:22 PDT 2018; root:xnu-3248.73.10~1/RELEASE_X86_64 x86_64

# OS X El Capitan: 10.11.6 (15G21013)

I hope I haven't made any silly mistakes(because I'm facing Guido here. :P ).

gvanrossum · 2018-08-03T23:47:37Z

No worries. Alas, I followed your recipe closely but got no errors. So the search for a reproducible testcase continues...

artem-kamyshev · 2018-10-12T16:08:16Z

Hello,

I'm not a user of pyannotate, just got a similar issue with myown code and got here while googling. Anyway, I've looked through the code to get familiar with it. Looks like that ```subprocess.Popen` which hangs is a standard Python 2.7 implementation. It has a multithreading issue:

If you try to start two or more subprocesses from different threads, they may end in a situation when one of the Popen calls is sitting on data = _eintr_retry_call(os.read, errpipe_read, 1048576), waiting for the child of another Popen (called from another thread) to finish.

This happens because the CLOSEONEXEC flag may not be set yet, when the fork in another thread has been done. This is a known bug of Python 2.7: https://bugs.python.org/issue20318

I guess it's not difficult to make a test-case to reproduce the issue now. Hope, this helps.

gvanrossum · 2018-10-12T16:17:51Z

Thanks @artem-kamyshev, that's very helpful information. I'll see if I can make time to test that!

dcsommer · 2018-11-13T01:52:19Z

I had this issue too and landed here via Google, like @artem-kamyshev. I managed to track down the issue to not setting close_fds=True on my subprocess.Popen() calls thanks to reading marcelm/cutadapt#137 and the commit that fixed it.

This pitfall in subprocess.Popen() is outlined in https://bugs.python.org/issue12786 (specifically in this comment) and explains why Python 3 defaults close_fds=True rather than False as in Python 2.

gvanrossum · 2018-11-13T20:34:36Z

This is convincing, but it is also not good news -- it means we can't fix this in pyannotate itself, we have to fix the calling application to pass close_fds=True to Popen() (or the subprocess wrapper functions).

jbianchia2e · 2019-04-06T00:23:49Z

in the bug linked by @artem-kamyshev they suggest using subprocess32

…ox/pyannotate#67

ibzib mentioned this issue Jul 3, 2019

[BEAM-7670] portable py: prevent race opening worker subprocess apache/beam#8992

Merged

rroutsong pushed a commit to kbasecollaborations/GEMMA_GWAS that referenced this issue Jul 17, 2019

add close_fds=True to subprocess calls to try and prevent hang: dropb…

b56efe3

…ox/pyannotate#67

TomorrowIsAnOtherDay mentioned this issue Aug 19, 2020

fix #391 solve the problem that unit tests get stuck PaddlePaddle/PARL#392

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Hang in subprocess.Popen() #67

Hang in subprocess.Popen() #67

gvanrossum commented Mar 22, 2018 •

edited

xiaket commented Jun 28, 2018

xiaket commented Jun 29, 2018

gvanrossum commented Aug 3, 2018

artem-kamyshev commented Oct 12, 2018

gvanrossum commented Oct 12, 2018

dcsommer commented Nov 13, 2018 •

edited

gvanrossum commented Nov 13, 2018

jbianchia2e commented Apr 6, 2019

Hang in subprocess.Popen() #67

Hang in subprocess.Popen() #67

Comments

gvanrossum commented Mar 22, 2018 • edited

xiaket commented Jun 28, 2018

xiaket commented Jun 29, 2018

The test code

The setup process

The output

My notes

ref

gvanrossum commented Aug 3, 2018

artem-kamyshev commented Oct 12, 2018

gvanrossum commented Oct 12, 2018

dcsommer commented Nov 13, 2018 • edited

gvanrossum commented Nov 13, 2018

jbianchia2e commented Apr 6, 2019

gvanrossum commented Mar 22, 2018 •

edited

dcsommer commented Nov 13, 2018 •

edited