Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

WithThreadsTestPool.test_release_task_refs: flaky test or possible free-threading bug #118413

Closed
colesbury opened this issue Apr 29, 2024 · 3 comments
Assignees
Labels
tests Tests in the Lib/test dir topic-free-threading

Comments

@colesbury
Copy link
Contributor

colesbury commented Apr 29, 2024

This might be a free-threading related bug or possibly a flaky test. We should investigate and try to figure out the root cause.

======================================================================
FAIL: test_release_task_refs (test.test_multiprocessing_spawn.test_threads.WithThreadsTestPool.test_release_task_refs)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "D:\a\cpython\cpython\Lib\test\_test_multiprocessing.py", line 2817, in test_release_task_refs
    self.assertEqual(set(wr() for wr in refs), {None})
    ~~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
AssertionError: Items in the first set but not the second:
<test._test_multiprocessing.CountedObject object at 0x03E7CA50>

Originally reported by hugovk in #118331 (comment) (https://github.com/python/cpython/actions/runs/8884999534/attempts/1).

Linked PRs

@colesbury colesbury added tests Tests in the Lib/test dir topic-free-threading labels Apr 29, 2024
@colesbury colesbury changed the title WithThreadsTestPool.test_release_task_refs : AssertionError: Items in the first set but not the second WithThreadsTestPool.test_release_task_refs: flaky test or possible free-threading bug Apr 29, 2024
colesbury added a commit to colesbury/cpython that referenced this issue May 1, 2024
@colesbury
Copy link
Contributor Author

I'm able to reproduce this locally on Linux with a few modifications:

First, apply https://gist.github.com/colesbury/f8ba31699eb6fcc79a29a2dd4e41e817 to enable the test and run it in a loop.

Then run multiple test instances in parallel on a single CPU:

taskset -c 0 ./python -m test -j 8 test_multiprocessing_spawn.test_threads -m test.test_multiprocessing_spawn.test_threads.WithThreadsTestPool.test_release_task_refs test_multiprocessing_spawn.test_threads -m test.test_multiprocessing_spawn.test_threads.WithThreadsTestPool.test_release_task_refs test_multiprocessing_spawn.test_threads -m test.test_multiprocessing_spawn.test_threads.WithThreadsTestPool.test_release_task_refs test_multiprocessing_spawn.test_threads -m test.test_multiprocessing_spawn.test_threads.WithThreadsTestPool.test_release_task_refs test_multiprocessing_spawn.test_threads -m test.test_multiprocessing_spawn.test_threads.WithThreadsTestPool.test_release_task_refs test_multiprocessing_spawn.test_threads -m test.test_multiprocessing_spawn.test_threads.WithThreadsTestPool.test_release_task_refs test_multiprocessing_spawn.test_threads -m test.test_multiprocessing_spawn.test_threads.WithThreadsTestPool.test_release_task_refs test_multiprocessing_spawn.test_threads -m test.test_multiprocessing_spawn.test_threads.WithThreadsTestPool.test_release_task_refs

@colesbury
Copy link
Contributor Author

The fix is to move the time.sleep(DELTA) before the gc.collect():

gc.collect() # For PyPy or other GCs.
time.sleep(DELTA) # let threaded cleanup code run

The time.sleep() is because the worker processes hold onto the result briefly after it's returned to the main thread. In the free-threaded build, we want the gc.collect() to happen after the workers no longer reference the the CountedObject() values because of the some objects passed across threads have their destructors delayed.

colesbury added a commit to colesbury/cpython that referenced this issue May 1, 2024
The `time.sleep()` call should happen before the GC to give the worker
threads time to clean-up their remaining references to objs.
Additionally, use `support.gc_collect()` instead of `gc.collect()`
just in case the extra GC calls matter.
colesbury added a commit that referenced this issue May 2, 2024
The `time.sleep()` call should happen before the GC to give the worker
threads time to clean-up their remaining references to objs.
Additionally, use `support.gc_collect()` instead of `gc.collect()`
just in case the extra GC calls matter.
@colesbury colesbury self-assigned this May 2, 2024
@colesbury
Copy link
Contributor Author

The test is fixed now and re-enabled in the free-threaded build.

SonicField pushed a commit to SonicField/cpython that referenced this issue May 8, 2024
SonicField pushed a commit to SonicField/cpython that referenced this issue May 8, 2024
…ython#118494)

The `time.sleep()` call should happen before the GC to give the worker
threads time to clean-up their remaining references to objs.
Additionally, use `support.gc_collect()` instead of `gc.collect()`
just in case the extra GC calls matter.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
tests Tests in the Lib/test dir topic-free-threading
Projects
None yet
Development

No branches or pull requests

1 participant