Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Hang in subprocess.Popen() #67

Open
gvanrossum opened this issue Mar 22, 2018 · 8 comments
Open

Hang in subprocess.Popen() #67

gvanrossum opened this issue Mar 22, 2018 · 8 comments

Comments

@gvanrossum
Copy link
Contributor

gvanrossum commented Mar 22, 2018

[UPDATE: If you have this problem, the solution is to pass close_fds=True to Popen()]

I've got a use case where a process being traced for type collection uses subprocess.Popen() to execute some helper program, and the Popen() call hangs at the line

data = _eintr_retry_call(os.read, errpipe_read, 1048576)

in Popen._execute_child(). (This is Python 2.7 on Mac, i.e. POSIX.)

That pipe has FD_CLOEXEC, so the child is not hitting the exec(). Presumably this is because it hangs in a Queue.put() operation in _trace_dispatch() (e.g. here).

I can think of a gross fix that monkey-patches os.fork to disable the profiling hook around the fork() so the child doesn't do this. But perhaps there's a more elegant solution (without using os.register_at_fork(), which is Python 3.7+ only)? Or the tracing hook could check the pid?

[UPDATE:] I can't repro this in a small test program. But it's real, and the os.fork monkey-patch fixes it. Not sure what to do about it yet, the monkey-patch seems risky.

@xiaket
Copy link

xiaket commented Jun 28, 2018

I think I've bump into this one today(with subprocess.check_output) and I'll see whether I can reproduce it with some test code.

@xiaket
Copy link

xiaket commented Jun 29, 2018

This is a weird case. This is how I (somehow) reproduced it:

The test code

# cat anno.py
import subprocess
import unittest

from pyannotate_runtime import collect_types
from schematics import Model
from schematics.types import StringType


class GitPackageSource(Model):
    uri = StringType(required=True)
    tag = StringType(serialize_when_none=False)
    branch = StringType(serialize_when_none=False)
    commit = StringType(serialize_when_none=False)


class SourceProcessor(object):
    def sleep(self):
        subprocess.check_output(["sleep", "1"])
        return "ok"


class TestUtil(unittest.TestCase):
    def test_SourceProcessor_helpers(self):
        sp = SourceProcessor()

        for i in [{"uri": "git@foo"}, {"tag": "foo"}, {"commit": "1234"}]:
            g = GitPackageSource(i)
        print("GitPackageSource instantiated")

        self.assertEqual(sp.sleep(), "ok")


if __name__ == "__main__":
    collect_types.init_types_collection()
    with collect_types.collect():
        unittest.main()

The setup process

{~}mkdir test && cd test
{~/test}virtualenv venv
New python executable in /Users/shi.kaix/test/venv/bin/python2.7
Also creating executable in /Users/shi.kaix/test/venv/bin/python
Installing setuptools, pip, wheel...done.
{~/test}source venv/bin/activate
∇{~/test}pip install pyannotate schematics
Successfully installed mypy-extensions-0.3.0 pyannotate-1.0.6 schematics-2.1.0 six-1.11.0 typing-3.6.4
∇{~/test}for i in `seq 1 10`; do gtimeout 5 python anno.py; echo $?; done

That is, create a new directory, init a virtualenv, install some dependencies and run the test code 10 times. I've observed an 80% chance that the test code will not finish within 5 seconds.

The output

A typical output could look like this:

GitPackageSource instantiated
124
GitPackageSource instantiated
124
GitPackageSource instantiated
124
GitPackageSource instantiated
124
GitPackageSource instantiated
124
GitPackageSource instantiated
.
----------------------------------------------------------------------
Ran 1 test in 1.005s

OK
0
GitPackageSource instantiated
124
GitPackageSource instantiated
124
GitPackageSource instantiated
124
GitPackageSource instantiated
.
----------------------------------------------------------------------
Ran 1 test in 1.134s

OK
0

So most of the time, the gtimeout command exit with 124, which signify that the python process has been killed.

My notes

  1. The original command that I've used is git ls-remote git@github.com:cloudtools/stacker.git, which would give some output. It took around 1 second to finish on my end and the error rate is around 80%. I switched it to sleep 1 and it give nearly the same error rate.
  2. You will of course note that there is a GitPackageSource instantiated message there in the output. We can see that happen all the time and it always finished successfully. However, if we remove this part:
        for i in [{"uri": "git@foo"}, {"tag": "foo"}, {"commit": "1234"}]:
            g = GitPackageSource(i)
        print("GitPackageSource instantiated")

The error rate would drop significantly to around 10%. If we replace GitPackageSource with UserDict, the error rate would be around 10%.

  1. If we instead of sending SIGTERM to the python process and instruct gtimeout to send SIGINT to the process, we'll get:
∇{~/test}gtimeout -s SIGINT 5 python anno.py
Traceback (most recent call last):
  File "anno.py", line 36, in <module>
    unittest.main()
  File "/usr/local/Cellar/python@2/2.7.15_1/Frameworks/Python.framework/Versions/2.7/lib/python2.7/unittest/main.py", line 95, in __init__
    self.runTests()
  File "/usr/local/Cellar/python@2/2.7.15_1/Frameworks/Python.framework/Versions/2.7/lib/python2.7/unittest/main.py", line 232, in runTests
    self.result = testRunner.run(self.test)
  File "/usr/local/Cellar/python@2/2.7.15_1/Frameworks/Python.framework/Versions/2.7/lib/python2.7/unittest/runner.py", line 151, in run
    test(result)
  File "/usr/local/Cellar/python@2/2.7.15_1/Frameworks/Python.framework/Versions/2.7/lib/python2.7/unittest/suite.py", line 70, in __call__
    return self.run(*args, **kwds)
  File "/usr/local/Cellar/python@2/2.7.15_1/Frameworks/Python.framework/Versions/2.7/lib/python2.7/unittest/suite.py", line 108, in run
    test(result)
  File "/usr/local/Cellar/python@2/2.7.15_1/Frameworks/Python.framework/Versions/2.7/lib/python2.7/unittest/suite.py", line 70, in __call__
    return self.run(*args, **kwds)
  File "/usr/local/Cellar/python@2/2.7.15_1/Frameworks/Python.framework/Versions/2.7/lib/python2.7/unittest/suite.py", line 108, in run
    test(result)
  File "/usr/local/Cellar/python@2/2.7.15_1/Frameworks/Python.framework/Versions/2.7/lib/python2.7/unittest/case.py", line 393, in __call__
    return self.run(*args, **kwds)
  File "/usr/local/Cellar/python@2/2.7.15_1/Frameworks/Python.framework/Versions/2.7/lib/python2.7/unittest/case.py", line 329, in run
    testMethod()
  File "anno.py", line 30, in test_SourceProcessor_helpers
    self.assertEqual(sp.sleep(), "ok")
  File "anno.py", line 19, in sleep
    subprocess.check_output(["sleep", "1"])
  File "/usr/local/Cellar/python@2/2.7.15_1/Frameworks/Python.framework/Versions/2.7/lib/python2.7/subprocess.py", line 216, in check_output
    process = Popen(stdout=PIPE, *popenargs, **kwargs)
  File "/usr/local/Cellar/python@2/2.7.15_1/Frameworks/Python.framework/Versions/2.7/lib/python2.7/subprocess.py", line 394, in __init__
    errread, errwrite)
  File "/usr/local/Cellar/python@2/2.7.15_1/Frameworks/Python.framework/Versions/2.7/lib/python2.7/subprocess.py", line 1023, in _execute_child
    data = _eintr_retry_call(os.read, errpipe_read, 1048576)
  File "/usr/local/Cellar/python@2/2.7.15_1/Frameworks/Python.framework/Versions/2.7/lib/python2.7/subprocess.py", line 125, in _eintr_retry_call
    return func(*args)
KeyboardInterrupt

I guess this is what you have seen in your case?

ref

I don't think it's related but the software versions are as follows:

∇{~/test}python --version
Python 2.7.15
∇{~/test}uname -a
Darwin sbmelmac-07466.local 15.6.0 Darwin Kernel Version 15.6.0: Wed May  2 21:04:22 PDT 2018; root:xnu-3248.73.10~1/RELEASE_X86_64 x86_64

# OS X El Capitan: 10.11.6 (15G21013)

I hope I haven't made any silly mistakes(because I'm facing Guido here. :P ).

@gvanrossum
Copy link
Contributor Author

No worries. Alas, I followed your recipe closely but got no errors. So the search for a reproducible testcase continues...

@artem-kamyshev
Copy link

Hello,

I'm not a user of pyannotate, just got a similar issue with myown code and got here while googling. Anyway, I've looked through the code to get familiar with it. Looks like that ```subprocess.Popen` which hangs is a standard Python 2.7 implementation. It has a multithreading issue:

If you try to start two or more subprocesses from different threads, they may end in a situation when one of the Popen calls is sitting on data = _eintr_retry_call(os.read, errpipe_read, 1048576), waiting for the child of another Popen (called from another thread) to finish.

This happens because the CLOSEONEXEC flag may not be set yet, when the fork in another thread has been done. This is a known bug of Python 2.7: https://bugs.python.org/issue20318

I guess it's not difficult to make a test-case to reproduce the issue now. Hope, this helps.

@gvanrossum
Copy link
Contributor Author

Thanks @artem-kamyshev, that's very helpful information. I'll see if I can make time to test that!

@dcsommer
Copy link

dcsommer commented Nov 13, 2018

I had this issue too and landed here via Google, like @artem-kamyshev. I managed to track down the issue to not setting close_fds=True on my subprocess.Popen() calls thanks to reading marcelm/cutadapt#137 and the commit that fixed it.

This pitfall in subprocess.Popen() is outlined in https://bugs.python.org/issue12786 (specifically in this comment) and explains why Python 3 defaults close_fds=True rather than False as in Python 2.

@gvanrossum
Copy link
Contributor Author

This is convincing, but it is also not good news -- it means we can't fix this in pyannotate itself, we have to fix the calling application to pass close_fds=True to Popen() (or the subprocess wrapper functions).

@jbianchia2e
Copy link

in the bug linked by @artem-kamyshev they suggest using subprocess32

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants