torch.Library can easily cause segfault on loading/unloading #125234
Labels
module: library
Related to torch.library (for registering ops from Python)
triaged
This issue has been looked at a team member, and triaged and prioritized into an appropriate module
馃悰 Describe the bug
When a segfault occurs, it looks something like this:
Unfortunately, I was unable to make a self contained reproducer. There seems to be something funny that pytest is doing with lifetimes that is causing the problem. To reproduce:
test_lib._destroy()
call withpass
pip install detect-test-pollution
, create a testids.txt file with contents:and then run the test with
pytest test/test_fake_tensor.py -p detect_test_pollution --dtp-testids-input-file testids.txt
As you can see above, explicitly destroying the library object fixes the problem. This is fairly reminiscent of typical pytest pathology, where pytest keeps things alive longer than they should be. But it is not only keeping things alive, because if you intentionally leak the test_lib object (e.g., by assigning it to a global), you get this error as you were hoping for:
Very puzzling!
I'm not going to investigate the problem any further, but I will note that explicit deallocation solves the problem, so we should probably prophylactically update all of our torch.Library use in test suite to explicitly use a scoped handler which will handle deallocation explicitly.
Versions
main
cc @anjali411
The text was updated successfully, but these errors were encountered: