Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

python tests fail because failed GIL interaction #1939

Open
negril opened this issue Jan 28, 2024 · 9 comments
Open

python tests fail because failed GIL interaction #1939

negril opened this issue Jan 28, 2024 · 9 comments
Labels
Python Issues that involve majority python development (vs C++).

Comments

@negril
Copy link

negril commented Jan 28, 2024

Tested with 2.3.0 and 2.3.1 on gentoo using python-3.11 and python-3.12.

Running ctest -R test_python fails with:

test_apply (CPUProcessorTest.CPUProcessorTest.test_apply) ... pybind11::handle::inc_ref() is being called while the GIL is either not held or invalid. Please see https://pybind11.readthedocs.io/en/stable/advanced/misc.html#common-sources-of-global-interpreter-lock-errors for debugging advice.
If you are convinced there is no bug in your code, you can #define PYBIND11_NO_ASSERT_GIL_HELD_INCREF_DECREFto disable this check. In that case you have to ensure this #define is consistently used for all translation units linked into a given pybind11 extension, otherwise there will be ODR violations.The failing pybind11::handle::inc_ref() call was triggered on a numpy.ndarray object.
ERROR
test_apply_src_dst (CPUProcessorTest.CPUProcessorTest.test_apply_src_dst) ... pybind11::handle::inc_ref() is being called while the GIL is either not held or invalid. Please see https://pybind11.readthedocs.io/en/stable/advanced/misc.html#common-sources-of-global-interpreter-lock-errors for debugging advice.
If you are convinced there is no bug in your code, you can #define PYBIND11_NO_ASSERT_GIL_HELD_INCREF_DECREFto disable this check. In that case you have to ensure this #define is consistently used for all translation units linked into a given pybind11 extension, otherwise there will be ODR violations.The failing pybind11::handle::inc_ref() call was triggered on a numpy.ndarray object.
ERROR
======================================================================
ERROR: test_apply (CPUProcessorTest.CPUProcessorTest.test_apply)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/var/tmp/paludis/media-libs-opencolorio-2.3.1/work/OpenColorIO-2.3.1/tests/python/CPUProcessorTest.py", line 302, in test_apply
    image = OCIO.PackedImageDesc(arr, 7, 3, 3)
            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
RuntimeError: pybind11::handle::inc_ref() PyGILState_Check() failure.

======================================================================
ERROR: test_apply_src_dst (CPUProcessorTest.CPUProcessorTest.test_apply_src_dst)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/var/tmp/paludis/media-libs-opencolorio-2.3.1/work/OpenColorIO-2.3.1/tests/python/CPUProcessorTest.py", line 331, in test_apply_src_dst
    src_image = OCIO.PackedImageDesc(src_arr, 7, 3, 3)
                ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
RuntimeError: pybind11::handle::inc_ref() PyGILState_Check() failure.

----------------------------------------------------------------------
@carolalynn carolalynn added the Python Issues that involve majority python development (vs C++). label Feb 7, 2024
@meimchu
Copy link
Contributor

meimchu commented Feb 7, 2024

I would like to spend some time looking into this. I want to say that there isn't an issue and we could use PYBIND11_NO_ASSERT_GIL_HELD_INCREF_DECREF to disable this check. But that may not be ideal either.
Fortunately the VFX Reference platform only goes up to python 3.11 for now so just to be sure, this is not a problem for that version of python, correct?

@negril
Copy link
Author

negril commented Feb 7, 2024

Tested with 2.3.0 and 2.3.1 on gentoo using python-3.11 and python-3.12.

Python-3.11 is affected as well.

@remia
Copy link
Collaborator

remia commented Feb 8, 2024

It would be interesting to find out why this error isn't triggered in our CI, we test on both Python 3.11 and 3.12 when building the wheels. Maybe Python is compiled in a different way on Gentoo? Which version of Pybind11 are you using? OCIO is using a pretty old version I think at this point and newer versions are untested.

@negril
Copy link
Author

negril commented Feb 8, 2024

Versions used are:

  • python-3.11.7
  • python-3.12.1_p1
  • pybind11-2.11.1
  • gcc-12.3.1_p20240112

I'll update to python-3.11.8 and python-3.12.2 later today and retest, doubt it changes anything.

I can write a Dockerfile if that helps you reproduce it.

@remia
Copy link
Collaborator

remia commented Feb 8, 2024

Thanks for the information @negril, we are using Pybind11 2.9.2 so it could be interesting on our side to try with 2.11.1.

@remia
Copy link
Collaborator

remia commented Feb 8, 2024

Strangely enough, after explicitly using Pybind11 2.11.1 and Python 3.12 on macOS, Debug build, I can't get any of these assertions to trigger, even after adding obviously wrong acquire / release sequences in the code. Otherwise I can't see anything obviously wrong we are doing in the Pybind11 code for the time being.

Maybe a Docker image could help yeah, @meimchu did you manage to reproduce the issue on your side?

@meimchu
Copy link
Contributor

meimchu commented Feb 9, 2024

Indeed I agree with maybe having a docker image could be helpful with debugging.
I have Python 3.9.6 on my MacOS unfortunately though I did try to build it with the newer Pybind11 2.11.1 and did not encounter that issue.

@meimchu
Copy link
Contributor

meimchu commented Feb 10, 2024

I have used pyenv to get both python 3.11.7 and python 3.12.1 on my MacOS. I also changed the build setting to use pybind 2.11.1 to build. I ran both ctest -R test_python and manually test with both versions of python on the OpenColorIOTestSuite.py. None of them appear to have return the same issues as all tests passed. Feels like a potential variable here might be the GCC version then? At least in the case of VFX reference platform, they’ve been sticking with GCC 11.2.1 for 2023/2024 for Linux distros. This mystery continues, it seems!

@remia
Copy link
Collaborator

remia commented Mar 21, 2024

I did hit the same issue while working on CI updates, could you try again on the main branch?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Python Issues that involve majority python development (vs C++).
Projects
None yet
Development

No branches or pull requests

4 participants