New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
AttributeError: 'list' object has no attribute 'local_scope' #7292
Comments
@Rhett-Ying do we have DistDGL examples? |
please refer to non-dist version of GAT/GCN models such as https://github.com/dmlc/dgl/tree/master/examples/pytorch/gat to make sure it's runnable. Model code should be same both in DistDGL and non-dist. A better suggestion for running various model with distributed training/inference is utilizing GraphStorm which offers high level APIs. |
Thanks for your advice. Since the "Gloo connectFullMesh failed with..." error is not resolved, I am trying to train some models from https://github.com/dmlc/dgl/tree/master/examples/pytorch/ on 2 machines. Also, I would like to ask about dataset partitioning. When dividing the dataset with https://github.com/dmlc/dgl/tree/master/examples/pytorch/graphsage/dist/partition_graph.py, the memory size required is several times the size of the dataset. Are there any corresponding optimisations for memory, or are other tools provided? |
Unfortunately there's no much optimization available for the partition stage. |
This issue has been automatically marked as stale due to lack of activity. It will be closed if no further activity occurs. Thank you |
Hi, I am closing this issue assuming you are happy about our response. Feel free to follow up and reopen the issue if you have more questions with regard to our response. |
🐛 Bug
When I run dgl\examples\pytorch\graphsage\dist\train_dist.py on GPUs as the file README.md, it works fine, but when changing the network layer of the model the following problem occurs:
To Reproduce
Steps to reproduce the behavior:
execute
The cluster starts as expected and then the following problem occurs
execute
The information obtained is
Expected behavior
Apply distributed training to the training of other models, e.g. GAT, GCN, GIN, etc.
Environment
conda
,pip
, source): condaAdditional context
After reviewing the documentation on docs.dgl.ai, I am still unclear on how to resolve the following error:
The code in the dgl/examples/pytorch/graphsage/dist file is quite enlightening, and I am interested in expanding it to incorporate additional models. Any guidance you could offer would be greatly appreciated.
The command that executes the training has a few more parameters or paths than the command in README.md because the following problems occurs:
or
I have no idea how to solve this.
Once again, thank you for your exceptional work!
The text was updated successfully, but these errors were encountered: