Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make TSAN tests pass with the GIL disabled in free-threaded builds #117657

Open
4 of 31 tasks
Tracked by #108219
mpage opened this issue Apr 8, 2024 · 3 comments
Open
4 of 31 tasks
Tracked by #108219

Make TSAN tests pass with the GIL disabled in free-threaded builds #117657

mpage opened this issue Apr 8, 2024 · 3 comments
Assignees
Labels
topic-free-threading type-feature A feature request or enhancement

Comments

@mpage
Copy link
Contributor

mpage commented Apr 8, 2024

Feature or enhancement

We need to make the TSAN tests pass with the GIL disabled before we can disable the GIL by default in free-threaded builds.

This should proceed in two phases:

  1. Add suppressions for existing TSAN warnings.
  2. Triage, fix, and remove the suppressions for the warnings enumerated in (1).

How to run the TSAN tests

  1. Build with TSAN:
    env CC=clang CXX=clang++ ./configure --disable-gil --with-thread-sanitizer --with-pydebug && make -j
  2. Run tests:
    env TSAN_OPTIONS=suppressions=<repo_root>/Tools/tsan/suppressions_free_threading.txt ./python -m test --tsan -j4

Working on a race

  1. Verify that the TSAN tests are passing using the steps from above.
  2. Pick a suppression from the section below and assign it to yourself (edit it and add your username next to it). Some of them may be related. If you find that other suppressions are related to the race you're working on please assign them to yourself or contact their owner if they're already assigned.
  3. Optionally, head over to the race index to see examples and which tests reproduce the race corresponding to the suppression.
  4. Delete the suppression from <repo_root>/Tools/tsan/suppressions_free_threading.txt , run the TSAN tests, and verify that the race is reported by TSAN. You may need to comment out unrelated functions (notably, _PyEval_EvalFrameDefault) in order to reproduce the race.
  5. Fix the race.

Suppressions

Tasks

Linked PRs

@mpage mpage added type-feature A feature request or enhancement topic-free-threading labels Apr 8, 2024
@mpage mpage self-assigned this Apr 8, 2024
colesbury added a commit to colesbury/cpython that referenced this issue Apr 12, 2024
colesbury pushed a commit that referenced this issue Apr 15, 2024
Additionally, reduce the iterations for a few weakref tests that would
otherwise take a prohibitively long amount of time (> 1 hour) when TSAN
is enabled and the GIL is disabled.
colesbury pushed a commit that referenced this issue Apr 15, 2024
…rld()` and `tstate_try_attach()` (#117828)

TSAN erroneously reports a data race between the `_Py_atomic_compare_exchange_int`
on `tstate->state` in `tstate_try_attach()` and the non-atomic load of
`tstate->state` in `start_the_world`. The `_Py_atomic_compare_exchange_int` fails,
but TSAN erroneously treats it as a store.
diegorusso pushed a commit to diegorusso/cpython that referenced this issue Apr 17, 2024
…ython#117736)

Additionally, reduce the iterations for a few weakref tests that would
otherwise take a prohibitively long amount of time (> 1 hour) when TSAN
is enabled and the GIL is disabled.
diegorusso pushed a commit to diegorusso/cpython that referenced this issue Apr 17, 2024
…the_world()` and `tstate_try_attach()` (python#117828)

TSAN erroneously reports a data race between the `_Py_atomic_compare_exchange_int`
on `tstate->state` in `tstate_try_attach()` and the non-atomic load of
`tstate->state` in `start_the_world`. The `_Py_atomic_compare_exchange_int` fails,
but TSAN erroneously treats it as a store.
diegorusso pushed a commit to diegorusso/cpython that referenced this issue Apr 17, 2024
DinoV pushed a commit that referenced this issue Apr 17, 2024
…#117954)

Fix data races in the method cache in free-threaded builds

These are technically data races, but I think they're benign (to
the extent that that is actually possible). We update cache entries
non-atomically but read them atomically from another thread, and there's
nothing that establishes a happens-before relationship between the
reads and writes that I can see.
DinoV added a commit to mpage/cpython that referenced this issue Apr 17, 2024
DinoV pushed a commit that referenced this issue Apr 17, 2024
…#117955)

Quiet erroneous TSAN reports of data races in `_PySeqLock`

TSAN reports a couple of data races between the compare/exchange in
`_PySeqLock_LockWrite` and the non-atomic loads in `_PySeqLock_{Abandon,Unlock}Write`.
This is another instance of TSAN incorrectly modeling failed compare/exchange
as a write instead of a load.
DinoV added a commit that referenced this issue Apr 19, 2024
)

Use relaxed load to check if dictkeys are immortal
DinoV added a commit that referenced this issue Apr 19, 2024
…th strong enough semantics (#118111)

Use acquire for load of ob_ref_shared
DinoV pushed a commit that referenced this issue Apr 23, 2024
… `tstate->state` (#118165)

Quiet TSAN warnings about remaining non-atomic accesses of `tstate->state`
colesbury pushed a commit that referenced this issue May 9, 2024
…H-118722) (#118870)

Using `race:` filters out warnings if the function appears anywhere in
the stack trace. This can hide a lot of unrelated warnings, especially
for a function like `_PyEval_EvalFrameDefault`, which is somewhere on
the stack more often than not.

Change all free-threaded suppressions to `race_top:`, which only matches
the top frame, and add any new suppressions this exposes.
(cherry picked from commit 98ff3f6)

Co-authored-by: Brett Simmers <swtaarrs@users.noreply.github.com>
mpage added a commit to mpage/cpython that referenced this issue May 9, 2024
mpage added a commit to mpage/cpython that referenced this issue May 9, 2024
mpage added a commit to mpage/cpython that referenced this issue May 9, 2024
SonicField pushed a commit to SonicField/cpython that referenced this issue May 10, 2024
…exit}__` (python#118812)

These methods are purely wrappers around `Semlock.{acquire,release}`,
which expect a critical section to be held.
SonicField pushed a commit to SonicField/cpython that referenced this issue May 10, 2024
…ython#118722)

Using `race:` filters out warnings if the function appears anywhere in
the stack trace. This can hide a lot of unrelated warnings, especially
for a function like `_PyEval_EvalFrameDefault`, which is somewhere on
the stack more often than not.

Change all free-threaded suppressions to `race_top:`, which only matches
the top frame, and add any new suppressions this exposes.
SonicField added a commit to SonicField/cpython that referenced this issue May 10, 2024
colesbury pushed a commit that referenced this issue May 10, 2024
…118865)

Use relaxed loads/stores when reading/writing to this field.
miss-islington pushed a commit to miss-islington/cpython that referenced this issue May 10, 2024
…main` (pythonGH-118865)

Use relaxed loads/stores when reading/writing to this field.
(cherry picked from commit 22d5185)

Co-authored-by: mpage <mpage@meta.com>
colesbury pushed a commit that referenced this issue May 10, 2024
`_Py_qsbr_unregister` is called when the PyThreadState is already
detached, so the access to `tstate->qsbr` isn't safe without locking the
shared mutex. Grab the `struct _qsbr_shared` from the interpreter
instead.
miss-islington pushed a commit to miss-islington/cpython that referenced this issue May 10, 2024
`_Py_qsbr_unregister` is called when the PyThreadState is already
detached, so the access to `tstate->qsbr` isn't safe without locking the
shared mutex. Grab the `struct _qsbr_shared` from the interpreter
instead.
(cherry picked from commit 33d2019)

Co-authored-by: Alex Turner <alexturner@meta.com>
colesbury pushed a commit that referenced this issue May 10, 2024
….main` (GH-118865) (#118904)

Use relaxed loads/stores when reading/writing to this field.
(cherry picked from commit 22d5185)

Co-authored-by: mpage <mpage@meta.com>
colesbury pushed a commit that referenced this issue May 10, 2024
`_Py_qsbr_unregister` is called when the PyThreadState is already
detached, so the access to `tstate->qsbr` isn't safe without locking the
shared mutex. Grab the `struct _qsbr_shared` from the interpreter
instead.
(cherry picked from commit 33d2019)

Co-authored-by: Alex Turner <alexturner@meta.com>
colesbury pushed a commit that referenced this issue May 10, 2024
)

This ensures we don't lose races that occur in subprocesses or
interleave races from workers running in parallel.

Log files are collected and packaged into a zipfile that can be
downloaded from the "Artifacts" section of the workflow run.
miss-islington pushed a commit to miss-islington/cpython that referenced this issue May 10, 2024
…pythonGH-118747)

This ensures we don't lose races that occur in subprocesses or
interleave races from workers running in parallel.

Log files are collected and packaged into a zipfile that can be
downloaded from the "Artifacts" section of the workflow run.
(cherry picked from commit b88889e)

Co-authored-by: mpage <mpage@meta.com>
colesbury pushed a commit that referenced this issue May 10, 2024
GH-118747) (#118931)

This ensures we don't lose races that occur in subprocesses or
interleave races from workers running in parallel.

Log files are collected and packaged into a zipfile that can be
downloaded from the "Artifacts" section of the workflow run.
(cherry picked from commit b88889e)

Co-authored-by: mpage <mpage@meta.com>
DinoV pushed a commit that referenced this issue May 21, 2024
Fix itertools.count in free-threading mode
DinoV added a commit that referenced this issue May 22, 2024
miss-islington pushed a commit to miss-islington/cpython that referenced this issue May 22, 2024
Fix missing atomic in dict_resize
(cherry picked from commit 2b3fb76)

Co-authored-by: Dino Viehland <dinoviehland@meta.com>
DinoV added a commit that referenced this issue May 22, 2024
)

gh-117657: Fix missing atomic in dict_resize (GH-119312)

Fix missing atomic in dict_resize
(cherry picked from commit 2b3fb76)

Co-authored-by: Dino Viehland <dinoviehland@meta.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
topic-free-threading type-feature A feature request or enhancement
Projects
None yet
Development

No branches or pull requests

3 participants