Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

multi gpu #1

Open
yufengwhy opened this issue Dec 27, 2021 · 4 comments
Open

multi gpu #1

yufengwhy opened this issue Dec 27, 2021 · 4 comments

Comments

@yufengwhy
Copy link

yufengwhy commented Dec 27, 2021

Can we use this code with multi gpus? if so, give some examples in readme? thx~

say, there are 1 billion nodes and 60 billion edges,
so the matrix will be 500G while A100 has 80G memory.

@davidmin7
Copy link
Collaborator

Hi, thank you for your interest in our work. Yes, you can use multiGPU with this implementation by allocating shared memory space and pinning it with the unified tensor. However, we are pushing our idea on DGL repository and some upgrades are coming soon dmlc/dgl#3616. So you can take a look at there as well!

@yufengwhy
Copy link
Author

I think huge matrix multiplication is very basic, which is not only for graph, so we can implement it as a basic unit, not coupled with graph?

@davidmin7
Copy link
Collaborator

Hi, yes the latter link is about the graph (if you need), but DGL supports unified tensor. Please see the document link here https://docs.dgl.ai/en/latest/api/python/dgl.contrib.UnifiedTensor.html. You simply need to declare the unified tensor on a shared memory space.

@davidmin7
Copy link
Collaborator

Some quick examples (may need some syntax corrections):

For the single GPU case:

def train(feat_matrix, ...):
    feat_matrix_unified = dgl.contrib.UnifiedTensor(feat_matrix, device=torch.device('cuda'))
    # Access feat_matrix_unified from here
    # e.g., a = feat_matrix_unified[indices]
    # !!! "indices" must be a cuda tensor !!!

if __name__ == '__main__':
    ...
    feat_matrix = dataload() # some user defined function here to load data into torch tensor
    train(feat_matrix, ...)

For the multi GPU case:

def train(feat_matrix, device, ...):
    torch.cuda.set_device(device)
    feat_matrix_unified = dgl.contrib.UnifiedTensor(feat_matrix, device=torch.device(device))
    # Access feat_matrix_unified from here
    # e.g., a = feat_matrix_unified[indices]
    # !!! "indices" must be a cuda tensor !!!

if __name__ == '__main__':
    ...
    feat_matrix = dataload() # some user defined function here to load data into torch tensor
    feat_matrix = feat_matrix.share_memory_()
    ...
    for proc_id in range(n_gpus):
        p = mp.Process(target=train, args=(feat_matrix, n_gpus, ...))
        p.start()

Hope this helps!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants