Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We鈥檒l occasionally send you account related emails.

Already on GitHub? Sign in to your account

[GraphBolt][MultiGPU] Error occurs when running multiGPU example with num-workers > 0 #7381

Open
Skeleton003 opened this issue May 8, 2024 · 0 comments
Assignees
Labels
Work Item Work items tracked in project tracker

Comments

@Skeleton003
Copy link
Collaborator

馃敤Work Item

IMPORTANT:

  • This template is only for dev team to track project progress. For feature request or bug report, please use the corresponding issue templates.
  • DO NOT create a new work item if the purpose is to fix an existing issue or feature request. We will directly use the issue in the project tracker.

Project tracker: https://github.com/orgs/dmlc/projects/2

Description

When running python examples/multigpu/graphbolt/node_classification.py --num-workers=2 (2 could be any number greater than 0), this error is raised within every distributed replica:

Traceback (most recent call last):
  File "<string>", line 1, in <module>
  File "/home/ubuntu/miniconda3/envs/dgl/lib/python3.9/multiprocessing/spawn.py", line 116, in spawn_main
    exitcode = _main(fd, parent_sentinel)
  File "/home/ubuntu/miniconda3/envs/dgl/lib/python3.9/multiprocessing/spawn.py", line 126, in _main
    self = reduction.pickle.load(from_parent)
  File "/home/ubuntu/miniconda3/envs/dgl/lib/python3.9/site-packages/torch/utils/data/datapipes/datapipe.py", line 359, in __setstate__
    self._datapipe = dill.loads(value)
  File "/home/ubuntu/miniconda3/envs/dgl/lib/python3.9/site-packages/dill/_dill.py", line 303, in loads
    return load(file, ignore, **kwds)
  File "/home/ubuntu/miniconda3/envs/dgl/lib/python3.9/site-packages/dill/_dill.py", line 289, in load
    return Unpickler(file, ignore=ignore, **kwds).load()
  File "/home/ubuntu/miniconda3/envs/dgl/lib/python3.9/site-packages/dill/_dill.py", line 444, in load
    obj = StockUnpickler.load(self)
AttributeError: 'PyCapsule' object has no attribute 'cudaHostUnregister'

Depending work items or issues

@Skeleton003 Skeleton003 added the Work Item Work items tracked in project tracker label May 8, 2024
@Skeleton003 Skeleton003 self-assigned this May 8, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Work Item Work items tracked in project tracker
Projects
None yet
Development

No branches or pull requests

1 participant