Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

test_repl's test_no_memory: AssertionError: -6 not found in (1, 120) #118331

Closed
hugovk opened this issue Apr 26, 2024 · 5 comments
Closed

test_repl's test_no_memory: AssertionError: -6 not found in (1, 120) #118331

hugovk opened this issue Apr 26, 2024 · 5 comments
Assignees
Labels
3.13 bugs and security fixes type-bug An unexpected behavior, bug, or error

Comments

@hugovk
Copy link
Member

hugovk commented Apr 26, 2024

Bug report

Bug description:

One of my PRs (#118283) started failing (https://github.com/python/cpython/actions/runs/8843882336?pr=118283) with this. Investigating, I can reproduce it locally on main, although I don't know why the CI passes on main.

OS: macOS Sonoma 14.4.1, M2.

To reproduce

git clone https://github.com/python/cpython
cd cpython
./configure --with-pydebug && make -j10
./python.exe Lib/test/test_repl.py

Actual result

.....F.
======================================================================
FAIL: test_no_memory (__main__.TestInteractiveInterpreter.test_no_memory)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/private/tmp/cpython/Lib/test/test_repl.py", line 86, in test_no_memory
    self.assertIn(p.returncode, (1, 120))
    ~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^^^^^
AssertionError: -6 not found in (1, 120)

Printing output from the test:

Python 3.13.0a5+ (bisect/good-e16062dd3428a5846344e0a8c6ee2f352d34ce1b-1-gdf73179048:df73179048, A) [Clang 15.0.0 (clang-1500.3.9.4)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> >>> >>> Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
    1/0
    ~^~
ZeroDivisionError: division by zero
>>> After the exception.
>>> object address  : 0x102f8b9a0
object refcount : 3
object type     : 0x102b831a8
object type name: MemoryError
object repr     :
lost sys.stderr
object address  : 0x102f8b930
object refcount : 3
object type     : 0x102b831a8
object type name: MemoryError
object repr     :
lost sys.stderr
object address  : 0x102f8b9a0
object refcount : 3
object type     : 0x102b831a8
object type name: MemoryError
object repr     :
lost sys.stderr
object address  : 0x102f8b930
object refcount : 3
object type     : 0x102b831a8
object type name: MemoryError
object repr     :
lost sys.stderr
object address  : 0x102f8b9a0
object refcount : 3
object type     : 0x102b831a8
object type name: MemoryError
object repr     :
lost sys.stderr
object address  : 0x102f8b930
object refcount : 3
object type     : 0x102b831a8
object type name: MemoryError
object repr     :
lost sys.stderr
object address  : 0x102f8b9a0
object refcount : 3
object type     : 0x102b831a8
object type name: MemoryError
object repr     :
lost sys.stderr
object address  : 0x102f8b930
object refcount : 3
object type     : 0x102b831a8
object type name: MemoryError
object repr     :
lost sys.stderr
object address  : 0x102f8b9a0
object refcount : 3
object type     : 0x102b831a8
object type name: MemoryError
object repr     :
lost sys.stderr
object address  : 0x102f8b930
object refcount : 3
object type     : 0x102b831a8
object type name: MemoryError
object repr     :
lost sys.stderr
object address  : 0x102f8b9a0
object refcount : 3
object type     : 0x102b831a8
object type name: MemoryError
object repr     :
lost sys.stderr
object address  : 0x102f8b930
object refcount : 3
object type     : 0x102b831a8
object type name: MemoryError
object repr     :
lost sys.stderr
object address  : 0x102f8b9a0
object refcount : 3
object type     : 0x102b831a8
object type name: MemoryError
object repr     :
lost sys.stderr
object address  : 0x102f8b930
object refcount : 3
object type     : 0x102b831a8
object type name: MemoryError
object repr     :
lost sys.stderr
object address  : 0x102f8b9a0
object refcount : 3
object type     : 0x102b831a8
object type name: MemoryError
object repr     :
lost sys.stderr
object address  : 0x102f8b930
object refcount : 3
object type     : 0x102b831a8
object type name: MemoryError
object repr     :
lost sys.stderr
Fatal Python error: _Py_Dealloc: Deallocator of type '_thread._localdummy' raised an exception
Python runtime state: finalizing (tstate=0x0000000102bf3c40)

Current thread 0x0000000201f77ac0 (most recent call first):
  <no Python frame>

Expected result

It passes with v3.13.0a5, where output is:

Python 3.13.0a5 (tags/v3.13.0a5:076d169ebb, Apr 26 2024, 22:13:38) [Clang 15.0.0 (clang-1500.3.9.4)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> >>> >>> Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
    1/0
    ~^~
ZeroDivisionError: division by zero
>>> After the exception.
>>> object address  : 0x100f63a10
object refcount : 3
object type     : 0x100b5f1a8
object type name: MemoryError
object repr     : 
lost sys.stderr
object address  : 0x100f639a0
object refcount : 3
object type     : 0x100b5f1a8
object type name: MemoryError
object repr     : 
lost sys.stderr
object address  : 0x100f63a10
object refcount : 3
object type     : 0x100b5f1a8
object type name: MemoryError
object repr     : 
lost sys.stderr
object address  : 0x100f639a0
object refcount : 3
object type     : 0x100b5f1a8
object type name: MemoryError
object repr     : 
lost sys.stderr
object address  : 0x100f63a10
object refcount : 3
object type     : 0x100b5f1a8
object type name: MemoryError
object repr     : 
lost sys.stderr
object address  : 0x100f639a0
object refcount : 3
object type     : 0x100b5f1a8
object type name: MemoryError
object repr     : 
lost sys.stderr
object address  : 0x100f63a10
object refcount : 3
object type     : 0x100b5f1a8
object type name: MemoryError
object repr     : 
lost sys.stderr
object address  : 0x100f639a0
object refcount : 3
object type     : 0x100b5f1a8
object type name: MemoryError
object repr     : 
lost sys.stderr
object address  : 0x100f63a10
object refcount : 3
object type     : 0x100b5f1a8
object type name: MemoryError
object repr     : 
lost sys.stderr
object address  : 0x100f639a0
object refcount : 3
object type     : 0x100b5f1a8
object type name: MemoryError
object repr     : 
lost sys.stderr
object address  : 0x100f63a10
object refcount : 3
object type     : 0x100b5f1a8
object type name: MemoryError
object repr     : 
lost sys.stderr
object address  : 0x100f639a0
object refcount : 3
object type     : 0x100b5f1a8
object type name: MemoryError
object repr     : 
lost sys.stderr
object address  : 0x100f63a10
object refcount : 3
object type     : 0x100b5f1a8
object type name: MemoryError
object repr     : 
lost sys.stderr
object address  : 0x100f639a0
object refcount : 3
object type     : 0x100b5f1a8
object type name: MemoryError
object repr     : 
lost sys.stderr
object address  : 0x100f63a10
object refcount : 3
object type     : 0x100b5f1a8
object type name: MemoryError
object repr     : 
lost sys.stderr
object address  : 0x100f639a0
object refcount : 3
object type     : 0x100b5f1a8
object type name: MemoryError
object repr     : 
lost sys.stderr

The main difference is the failure includes this:

Fatal Python error: _Py_Dealloc: Deallocator of type '_thread._localdummy' raised an exception
Python runtime state: finalizing (tstate=0x0000000102bf3c40)

Current thread 0x0000000201f77ac0 (most recent call first):
  <no Python frame>

Bisecting

df7317904849a41d51db39d92c5d431a18e22637 is the first bad commit
commit df7317904849a41d51db39d92c5d431a18e22637
Author: mpage <mpage@meta.com>
Date:   Mon Apr 8 07:58:38 2024 -0700

    gh-111926: Make weakrefs thread-safe in free-threaded builds (#117168)

    Most mutable data is protected by a striped lock that is keyed on the
    referenced object's address. The weakref's hash is protected using the
    weakref's per-object lock.

    Note that this only affects free-threaded builds. Apart from some minor
    refactoring, the added code is all either gated by `ifdef`s or is a no-op
    (e.g. `Py_BEGIN_CRITICAL_SECTION`).

 Include/cpython/weakrefobject.h                |   8 +
 Include/internal/pycore_interp.h               |   7 +
 Include/internal/pycore_object.h               |  40 +-
 Include/internal/pycore_pyatomic_ft_wrappers.h |   5 +
 Include/internal/pycore_weakref.h              |  73 +++-
 Lib/test/test_sys.py                           |   8 +-
 Lib/test/test_weakref.py                       |  19 +
 Modules/_sqlite/blob.c                         |   5 +-
 Modules/_sqlite/connection.c                   |   4 +-
 Modules/_ssl.c                                 |  13 +-
 Modules/_ssl/debughelpers.c                    |   6 +-
 Modules/_weakref.c                             |  42 +-
 Modules/clinic/_weakref.c.h                    |  20 +-
 Objects/dictobject.c                           |   8 +-
 Objects/typeobject.c                           |  12 +-
 Objects/weakrefobject.c                        | 537 ++++++++++++++-----------
 Python/pystate.c                               |   9 +
 17 files changed, 490 insertions(+), 326 deletions(-)
bisect found first bad commit

PR: #117168
Issue: #111926

The PR was merged three weeks ago and the CI is passing.


On my PR (https://github.com/python/cpython/actions/runs/8843882336?pr=118283), free-threaded builds pass, but regular ones fail. Ubuntu and macOS fail with:

AssertionError: -6 not found in (1, 120)

Windows with:

AssertionError: 3221225477 not found in (1, 120)

3221225477 seems to be 0xc0000005 STATUS_ACCESS_VIOLATION.

CPython versions tested on:

3.13

Operating systems tested on:

macOS

Linked PRs

@hugovk hugovk added type-bug An unexpected behavior, bug, or error 3.13 bugs and security fixes labels Apr 26, 2024
@colesbury
Copy link
Contributor

I am able to reproduce it locally when checking out #118283 (but not in main for whatever reason).

The bug I saw was due to _PyObject_SetManagedDict calling _PyDict_DetachFromObject, which sets an exception (due to the no-memory condition). The exception is not handled, which causes the fatal error in _Py_Dealloc.

We should call PyErr_WriteUnraisable() or further propagate up the error.

cc @DinoV (I think the relevant PR is #114742)

@colesbury colesbury self-assigned this Apr 26, 2024
colesbury added a commit to colesbury/cpython that referenced this issue Apr 26, 2024
When detaching a dict, the copy_values call may fail due to
out-of-memory errors. This can be triggered by test_no_memory in
test_repl.
colesbury added a commit to colesbury/cpython that referenced this issue Apr 26, 2024
@mpage
Copy link
Contributor

mpage commented Apr 27, 2024

@hugovk - I still can't repro this after merging gh-118334, but I did take a look through the weakref code and found a preexisting bug that might be what you're hitting. Can you try merging gh-118338 along with gh-118334 and see if it fixes things?

@hugovk
Copy link
Member Author

hugovk commented Apr 27, 2024

Good news!

Thanks both!

mpage added a commit to mpage/cpython that referenced this issue Apr 29, 2024
colesbury pushed a commit that referenced this issue Apr 29, 2024
…ng weakrefs (#118338)

It's not safe to raise an exception in `PyObject_ClearWeakRefs()` if one
is not already set, since it may be called by `_Py_Dealloc()`, which
requires that the active exception does not change.

Additionally, make sure we clear the weakrefs even when tuple allocation
fails.
colesbury added a commit that referenced this issue Apr 29, 2024
When detaching a dict, the `copy_values` call may fail due to
out-of-memory errors. This can be triggered by test_no_memory in
test_repl.
@colesbury
Copy link
Contributor

@hugovk, both PRs are now merged. When you get a chance to merge main into your PR, would you please verify that everything is okay and close the issue?

@colesbury colesbury assigned hugovk and unassigned colesbury Apr 29, 2024
@hugovk
Copy link
Member Author

hugovk commented Apr 29, 2024

Passing now, thank you!

After the first run, 5 tests failed on "Windows (free-threading) / build and test (x86)" but passed when restarted.

https://github.com/python/cpython/actions/runs/8884999534/attempts/1

https://github.com/python/cpython/actions/runs/8884999534/job/24396359883

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
3.13 bugs and security fixes type-bug An unexpected behavior, bug, or error
Projects
None yet
Development

No branches or pull requests

3 participants