Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Note] Dead lock when running layers.GraphIsomorphismConv (issue located in sparse_coo_tensor) #238

Open
jasperhyp opened this issue Nov 28, 2023 · 0 comments

Comments

@jasperhyp
Copy link

Hi! I finished writing this issue but did not submit it before I was able to resolve the issue. Thought it might still be helpful for people so I am keeping it.

I used Python 3.8 and PyTorch 1.12.1 and encountered infinite wait when running forward pass to GIN recently. It wasn't the case before, and I am unfortunately unsure what changes happened in my environment that potentially have led to this issue. Nevertheless, I was able to track the issue down to line 28-32 in utils.torch:

cpp_extension.load(self.name, self.sources, self.extra_cflags, self.extra_cuda_cflags,
                                  self.extra_ldflags, self.extra_include_paths, self.build_directory,
                                  self.verbose, **self.kwargs)

The GIN layer calls utils.sparse_coo_tensor (line 337-338 in layers.conv), which then calls torch_ext.sparse_coo_tensor_unsafe(indices, values, size) in line 185 in utils.torch, which then lead to the use of LazyExtensionLoader, and thus the above lines.

That function then tries to load the torch_ext.cpp file and jit compile it, however, during compilation, a deadlock is encountered in jit_compile. At this stage, the issue is clearer: It must be related to incomplete compilation that accidentally happened before. And so the solution is also straightforward: Delete ~/.cache/torch_extension/.

As a final note, it appeared that in PyTorch 1.13+, the performance issue of sparse_coo_tensor has been resolved (see this issue). I wonder if it is still necessary to use torch_ext anyway, and if it can be replaced by some built-in function potentially from torch_sparse.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant