multi gpu #1

yufengwhy · 2021-12-27T13:33:57Z

Can we use this code with multi gpus? if so, give some examples in readme? thx~

say, there are 1 billion nodes and 60 billion edges,
so the matrix will be 500G while A100 has 80G memory.

davidmin7 · 2021-12-30T10:09:05Z

Hi, thank you for your interest in our work. Yes, you can use multiGPU with this implementation by allocating shared memory space and pinning it with the unified tensor. However, we are pushing our idea on DGL repository and some upgrades are coming soon dmlc/dgl#3616. So you can take a look at there as well!

yufengwhy · 2021-12-30T14:15:28Z

I think huge matrix multiplication is very basic, which is not only for graph, so we can implement it as a basic unit, not coupled with graph?

davidmin7 · 2021-12-30T19:10:51Z

Hi, yes the latter link is about the graph (if you need), but DGL supports unified tensor. Please see the document link here https://docs.dgl.ai/en/latest/api/python/dgl.contrib.UnifiedTensor.html. You simply need to declare the unified tensor on a shared memory space.

davidmin7 · 2021-12-30T19:26:13Z

Some quick examples (may need some syntax corrections):

For the single GPU case:

def train(feat_matrix, ...):
    feat_matrix_unified = dgl.contrib.UnifiedTensor(feat_matrix, device=torch.device('cuda'))
    # Access feat_matrix_unified from here
    # e.g., a = feat_matrix_unified[indices]
    # !!! "indices" must be a cuda tensor !!!

if __name__ == '__main__':
    ...
    feat_matrix = dataload() # some user defined function here to load data into torch tensor
    train(feat_matrix, ...)

For the multi GPU case:

def train(feat_matrix, device, ...):
    torch.cuda.set_device(device)
    feat_matrix_unified = dgl.contrib.UnifiedTensor(feat_matrix, device=torch.device(device))
    # Access feat_matrix_unified from here
    # e.g., a = feat_matrix_unified[indices]
    # !!! "indices" must be a cuda tensor !!!

if __name__ == '__main__':
    ...
    feat_matrix = dataload() # some user defined function here to load data into torch tensor
    feat_matrix = feat_matrix.share_memory_()
    ...
    for proc_id in range(n_gpus):
        p = mp.Process(target=train, args=(feat_matrix, n_gpus, ...))
        p.start()

Hope this helps!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

multi gpu #1

multi gpu #1

yufengwhy commented Dec 27, 2021 •

edited

davidmin7 commented Dec 30, 2021

yufengwhy commented Dec 30, 2021

davidmin7 commented Dec 30, 2021

davidmin7 commented Dec 30, 2021

multi gpu #1

multi gpu #1

Comments

yufengwhy commented Dec 27, 2021 • edited

davidmin7 commented Dec 30, 2021

yufengwhy commented Dec 30, 2021

davidmin7 commented Dec 30, 2021

davidmin7 commented Dec 30, 2021

yufengwhy commented Dec 27, 2021 •

edited