[GraphBolt] Add experimental `ItemSet/Dict4` and `ItemSampler4` #7371

Skeleton003 · 2024-04-29T03:08:22Z

Description

benchmark:

num_ids: 24, num_workers: 0, drop_last: False, drop_uneven_inputs: False
Old: 5.26561164855957
New: 4.075196266174316

num_ids: 24, num_workers: 0, drop_last: False, drop_uneven_inputs: True
Old: 5.038467884063721
New: 5.061769247055054

num_ids: 24, num_workers: 0, drop_last: True, drop_uneven_inputs: False
Old: 5.08016300201416
New: 5.055493116378784

num_ids: 24, num_workers: 0, drop_last: True, drop_uneven_inputs: True
Old: 5.044290542602539
New: 5.022970676422119

num_ids: 24, num_workers: 2, drop_last: False, drop_uneven_inputs: False
Old: 7.418801546096802
New: 6.484843492507935

num_ids: 24, num_workers: 2, drop_last: False, drop_uneven_inputs: True
Old: 7.407760143280029
New: 7.527584791183472

num_ids: 24, num_workers: 2, drop_last: True, drop_uneven_inputs: False
Old: 6.492152690887451
New: 6.431138277053833

num_ids: 24, num_workers: 2, drop_last: True, drop_uneven_inputs: True
Old: 6.491805791854858
New: 7.4210569858551025

num_ids: 30, num_workers: 0, drop_last: False, drop_uneven_inputs: False
Old: 4.09150767326355
New: 5.011434316635132

num_ids: 30, num_workers: 0, drop_last: False, drop_uneven_inputs: True
Old: 5.040276288986206
New: 4.068592071533203

num_ids: 30, num_workers: 0, drop_last: True, drop_uneven_inputs: False
Old: 4.038927793502808
New: 4.038530349731445

num_ids: 30, num_workers: 0, drop_last: True, drop_uneven_inputs: True
Old: 5.019740343093872
New: 5.0285563468933105

num_ids: 30, num_workers: 2, drop_last: False, drop_uneven_inputs: False
Old: 7.428295612335205
New: 6.409729242324829

num_ids: 30, num_workers: 2, drop_last: False, drop_uneven_inputs: True
Old: 7.421130657196045
New: 7.533393383026123

num_ids: 30, num_workers: 2, drop_last: True, drop_uneven_inputs: False
Old: 7.41476035118103
New: 6.400209188461304

num_ids: 30, num_workers: 2, drop_last: True, drop_uneven_inputs: True
Old: 6.40072774887085
New: 6.447648048400879

num_ids: 32, num_workers: 0, drop_last: False, drop_uneven_inputs: False
Old: 4.057007789611816
New: 5.063795328140259

num_ids: 32, num_workers: 0, drop_last: False, drop_uneven_inputs: True
Old: 5.035150051116943
New: 5.006322145462036

num_ids: 32, num_workers: 0, drop_last: True, drop_uneven_inputs: False
Old: 5.089540958404541
New: 5.047980546951294

num_ids: 32, num_workers: 0, drop_last: True, drop_uneven_inputs: True
Old: 4.040552854537964
New: 5.0497941970825195

num_ids: 32, num_workers: 2, drop_last: False, drop_uneven_inputs: False
Old: 7.43973970413208
New: 7.493116855621338

num_ids: 32, num_workers: 2, drop_last: False, drop_uneven_inputs: True
Old: 7.553787469863892
New: 7.6020872592926025

num_ids: 32, num_workers: 2, drop_last: True, drop_uneven_inputs: False
Old: 6.490302085876465
New: 7.487463474273682

num_ids: 32, num_workers: 2, drop_last: True, drop_uneven_inputs: True
Old: 7.364883661270142
New: 7.4597368240356445

num_ids: 34, num_workers: 0, drop_last: False, drop_uneven_inputs: False
Old: 4.082199811935425
New: 4.053929328918457

num_ids: 34, num_workers: 0, drop_last: False, drop_uneven_inputs: True
Old: 4.063207149505615
New: 5.091043710708618

num_ids: 34, num_workers: 0, drop_last: True, drop_uneven_inputs: False
Old: 4.999620676040649
New: 5.112699031829834

num_ids: 34, num_workers: 0, drop_last: True, drop_uneven_inputs: True
Old: 5.024035930633545
New: 4.051522493362427

num_ids: 34, num_workers: 2, drop_last: False, drop_uneven_inputs: False
Old: 7.471214771270752
New: 6.554701328277588

num_ids: 34, num_workers: 2, drop_last: False, drop_uneven_inputs: True
Old: 6.449496269226074
New: 6.529990196228027

num_ids: 34, num_workers: 2, drop_last: True, drop_uneven_inputs: False
Old: 7.431456804275513
New: 7.41823673248291

num_ids: 34, num_workers: 2, drop_last: True, drop_uneven_inputs: True
Old: 6.479130506515503
New: 6.368876695632935

num_ids: 36, num_workers: 0, drop_last: False, drop_uneven_inputs: False
Old: 5.009013652801514
New: 5.050375461578369

num_ids: 36, num_workers: 0, drop_last: False, drop_uneven_inputs: True
Old: 4.0677573680877686
New: 4.125107288360596

num_ids: 36, num_workers: 0, drop_last: True, drop_uneven_inputs: False
Old: 5.023468971252441
New: 5.105181455612183

num_ids: 36, num_workers: 0, drop_last: True, drop_uneven_inputs: True
Old: 5.063021421432495
New: 5.089923143386841

num_ids: 36, num_workers: 2, drop_last: False, drop_uneven_inputs: False
Old: 7.424851179122925
New: 7.432251453399658

num_ids: 36, num_workers: 2, drop_last: False, drop_uneven_inputs: True
Old: 7.543227672576904
New: 7.601431131362915

num_ids: 36, num_workers: 2, drop_last: True, drop_uneven_inputs: False
Old: 6.457719326019287
New: 6.451065540313721

num_ids: 36, num_workers: 2, drop_last: True, drop_uneven_inputs: True
Old: 6.568817377090454
New: 6.491897344589233

Checklist

Please feel free to remove inapplicable items for your PR.

The PR title starts with [$CATEGORY] (such as [NN], [Model], [Doc], [Feature]])
I've leverage the tools to beautify the python and c++ code.
The PR is complete and small, read the Google eng practice (CL equals to PR) to understand more about small PR. In DGL, we consider PRs with less than 200 lines of core code change are small (example, test and documentation could be exempted).
All changes have test coverage
Code is well-documented
To the best of my knowledge, examples are either not affected by this change, or have been fixed to be compatible with this change
Related issue is referred in this PR
If the PR is for a new model/paper, I've updated the example index here.

Changes

dgl-bot · 2024-04-29T03:08:48Z

To trigger regression tests:

@dgl-bot run [instance-type] [which tests] [compare-with-branch];
For example: @dgl-bot run g4dn.4xlarge all dmlc/master or @dgl-bot run c5.9xlarge kernel,api dmlc/master

dgl-bot · 2024-04-29T03:10:32Z

Commit ID: 78b9e90

Build ID: 1

Status: ❌ CI test failed in Stage [Lint Check].

Report path: link

Full logs path: link

dgl-bot · 2024-04-29T03:43:07Z

Commit ID: 6354418

Build ID: 2

Status: ⚪️ CI test cancelled due to overrun.

Report path: link

Full logs path: link

dgl-bot · 2024-04-29T04:36:03Z

Commit ID: 5a5c786

Build ID: 3

Status: ✅ CI test succeeded.

Report path: link

Full logs path: link

dgl-bot · 2024-05-05T09:14:52Z

Commit ID: 22180a5053a84344aceea9926ea4e80b83ff7cfb

Build ID: 4

Status: ⚪️ CI test cancelled due to overrun.

Report path: link

Full logs path: link

dgl-bot · 2024-05-05T10:07:11Z

Commit ID: a2ca65173ae263f16b8b2182b782e040cd08c080

Build ID: 5

Status: ❌ CI test failed in Stage [Torch CPU (Win64) Unit test].

Report path: link

Full logs path: link

dgl-bot · 2024-05-06T04:08:07Z

Commit ID: 9d2e81a4480acdb79c59233c2efef9893e857d96

Build ID: 6

Status: ✅ CI test succeeded.

Report path: link

Full logs path: link

Skeleton003 · 2024-05-06T04:34:11Z

@Rhett-Ying Benchmark shows that the variation on performance is acceptable. I'am trying to find out a way to enable all replicas to obtain a random seed from the main process instead of letting user manually set it, but this is yet another topic. For now, I think we can merge this PR first.

Rhett-Ying · 2024-05-06T05:51:28Z

num_ids: 36, num_workers: 2

num_ids is the total number of ItemSet or ItemSetDict? If yes, it's too small and not persuasive.

Skeleton003 · 2024-05-06T09:53:37Z

benchmark on /dgl/examples/multigpu/graphbolt/node_classification.py:

ogbn-products

Old:

$ python /home/ubuntu/dgl/examples/multigpu/graphbolt/node_classification.py --gpu 0,1,2,3
Training with 4 gpus.
The dataset is already preprocessed.
Training...
48it [00:02, 16.06it/s]
Validating...
10it [00:00, 21.67it/s]
Epoch 00000 | Average Loss 2.3267 | Accuracy 0.7917 | Time 3.5637
48it [00:02, 21.37it/s]
Validating...
10it [00:00, 24.19it/s]
Epoch 00001 | Average Loss 0.9559 | Accuracy 0.8437 | Time 2.7528
48it [00:02, 21.33it/s]
Validating...
10it [00:00, 24.10it/s]
Epoch 00002 | Average Loss 0.7238 | Accuracy 0.8602 | Time 2.7597
48it [00:02, 21.33it/s]
Validating...
10it [00:00, 24.51it/s]
Epoch 00003 | Average Loss 0.6163 | Accuracy 0.8706 | Time 2.7502
48it [00:02, 21.45it/s]
Validating...
10it [00:00, 24.45it/s]
Epoch 00004 | Average Loss 0.5578 | Accuracy 0.8762 | Time 2.7404
48it [00:02, 20.19it/s]
Validating...
10it [00:00, 24.57it/s]
Epoch 00005 | Average Loss 0.5176 | Accuracy 0.8819 | Time 2.8776
48it [00:02, 21.50it/s]
Validating...
10it [00:00, 24.13it/s]
Epoch 00006 | Average Loss 0.4883 | Accuracy 0.8855 | Time 2.7396
48it [00:02, 21.42it/s]
Validating...
10it [00:00, 24.41it/s]
Epoch 00007 | Average Loss 0.4667 | Accuracy 0.8881 | Time 2.7437
48it [00:02, 21.31it/s]
Validating...
10it [00:00, 24.19it/s]
Epoch 00008 | Average Loss 0.4477 | Accuracy 0.8889 | Time 2.7596
48it [00:02, 21.46it/s]
Validating...
10it [00:00, 24.29it/s]
Epoch 00009 | Average Loss 0.4343 | Accuracy 0.8920 | Time 2.7416
Testing...
541it [00:19, 27.95it/s]
Test Accuracy 0.7348

New:

$ python /home/ubuntu/dgl/examples/multigpu/graphbolt/node_classification.py --gpu 0,1,2,3
Training with 4 gpus.
The dataset is already preprocessed.
Training...
48it [00:03, 15.84it/s]
Validating...
10it [00:00, 22.02it/s]
Epoch 00000 | Average Loss 2.3048 | Accuracy 0.7777 | Time 3.5975
48it [00:02, 21.28it/s]
Validating...
10it [00:00, 25.05it/s]
Epoch 00001 | Average Loss 0.9804 | Accuracy 0.8388 | Time 2.7448
48it [00:02, 21.31it/s]
Validating...
10it [00:00, 24.98it/s]
Epoch 00002 | Average Loss 0.7427 | Accuracy 0.8587 | Time 2.7464
48it [00:02, 21.43it/s]
Validating...
10it [00:00, 25.03it/s]
Epoch 00003 | Average Loss 0.6308 | Accuracy 0.8696 | Time 2.7333
48it [00:02, 21.40it/s]
Validating...
10it [00:00, 25.19it/s]
Epoch 00004 | Average Loss 0.5623 | Accuracy 0.8785 | Time 2.7332
48it [00:02, 20.29it/s]
Validating...
10it [00:00, 24.69it/s]
Epoch 00005 | Average Loss 0.5228 | Accuracy 0.8815 | Time 2.8657
48it [00:02, 21.37it/s]
Validating...
10it [00:00, 24.89it/s]
Epoch 00006 | Average Loss 0.4937 | Accuracy 0.8850 | Time 2.7418
48it [00:02, 21.41it/s]
Validating...
10it [00:00, 25.01it/s]
Epoch 00007 | Average Loss 0.4696 | Accuracy 0.8879 | Time 2.7378
48it [00:02, 21.36it/s]
Validating...
10it [00:00, 25.03it/s]
Epoch 00008 | Average Loss 0.4537 | Accuracy 0.8909 | Time 2.7409
48it [00:02, 21.40it/s]
Validating...
10it [00:00, 24.88it/s]
Epoch 00009 | Average Loss 0.4388 | Accuracy 0.8932 | Time 2.7407
Testing...
541it [00:19, 27.96it/s]
Test Accuracy 0.7393

ogbn-arxiv

Old:

$ python /home/ubuntu/dgl/examples/multigpu/graphbolt/node_classification.py --gpu 0,1,2,3 --dataset ogbn-arxiv
Training with 4 gpus.
The dataset is already preprocessed.
Training...
22it [00:01, 21.57it/s]
Validating...
8it [00:00, 52.40it/s]
Epoch 00000 | Average Loss 3.2543 | Accuracy 0.3002 | Time 1.2109
22it [00:00, 54.33it/s]
Validating...
8it [00:00, 70.41it/s]
Epoch 00001 | Average Loss 2.5287 | Accuracy 0.4404 | Time 0.5230
22it [00:00, 59.90it/s]
Validating...
8it [00:00, 71.66it/s]
Epoch 00002 | Average Loss 2.1985 | Accuracy 0.5054 | Time 0.4818
22it [00:00, 54.64it/s]
Validating...
8it [00:00, 86.39it/s]
Epoch 00003 | Average Loss 1.9795 | Accuracy 0.5349 | Time 0.4978
22it [00:00, 57.34it/s]
Validating...
8it [00:00, 78.11it/s]
Epoch 00004 | Average Loss 1.8419 | Accuracy 0.5529 | Time 0.4944
22it [00:00, 42.99it/s]
Validating...
8it [00:00, 73.39it/s]
Epoch 00005 | Average Loss 1.7533 | Accuracy 0.5649 | Time 0.6252
22it [00:00, 56.13it/s]
Validating...
8it [00:00, 76.69it/s]
Epoch 00006 | Average Loss 1.6852 | Accuracy 0.5713 | Time 0.5014
22it [00:00, 52.51it/s]
Validating...
8it [00:00, 79.52it/s]
Epoch 00007 | Average Loss 1.6405 | Accuracy 0.5766 | Time 0.5221
22it [00:00, 59.19it/s]
Validating...
8it [00:00, 67.85it/s]
Epoch 00008 | Average Loss 1.6055 | Accuracy 0.5814 | Time 0.4923
22it [00:00, 60.42it/s]
Validating...
8it [00:00, 71.80it/s]
Epoch 00009 | Average Loss 1.5681 | Accuracy 0.5878 | Time 0.4783
Testing...
12it [00:00, 82.86it/s]
Test Accuracy 0.5271

New:

$ python /home/ubuntu/dgl/examples/multigpu/graphbolt/node_classification.py --gpu 0,1,2,3 --dataset ogbn-arxiv
Training with 4 gpus.
The dataset is already preprocessed.
Training...
22it [00:01, 18.31it/s]
Validating...
8it [00:00, 54.37it/s]
Epoch 00000 | Average Loss 3.1735 | Accuracy 0.2941 | Time 1.3790
22it [00:00, 58.89it/s]
Validating...
8it [00:00, 78.07it/s]
Epoch 00001 | Average Loss 2.4895 | Accuracy 0.4520 | Time 0.4908
22it [00:00, 56.94it/s]
Validating...
8it [00:00, 73.67it/s]
Epoch 00002 | Average Loss 2.1515 | Accuracy 0.5135 | Time 0.5007
22it [00:00, 54.02it/s]
Validating...
8it [00:00, 69.11it/s]
Epoch 00003 | Average Loss 1.9372 | Accuracy 0.5381 | Time 0.5256
22it [00:00, 56.69it/s]
Validating...
8it [00:00, 70.72it/s]
Epoch 00004 | Average Loss 1.8119 | Accuracy 0.5560 | Time 0.5067
22it [00:00, 39.94it/s]
Validating...
8it [00:00, 74.97it/s]
Epoch 00005 | Average Loss 1.7279 | Accuracy 0.5639 | Time 0.6646
22it [00:00, 56.77it/s]
Validating...
8it [00:00, 79.99it/s]
Epoch 00006 | Average Loss 1.6723 | Accuracy 0.5734 | Time 0.4928
22it [00:00, 60.43it/s]
Validating...
8it [00:00, 71.34it/s]
Epoch 00007 | Average Loss 1.6253 | Accuracy 0.5817 | Time 0.4789
22it [00:00, 58.53it/s]
Validating...
8it [00:00, 91.09it/s]
Epoch 00008 | Average Loss 1.5881 | Accuracy 0.5844 | Time 0.4690
22it [00:00, 56.57it/s]
Validating...
8it [00:00, 77.58it/s]
Epoch 00009 | Average Loss 1.5577 | Accuracy 0.5878 | Time 0.4972
Testing...
12it [00:00, 88.09it/s]
Test Accuracy 0.5279

ogbn-papers100M

Old:

$ python /home/ubuntu/dgl/examples/multigpu/graphbolt/node_classification.py --gpu 0,1,2,3 --dataset ogbn-papers100M
Training with 4 gpus.
The dataset is already preprocessed.
Training...
294it [00:22, 13.15it/s]
Validating...
31it [00:02, 14.12it/s]
Epoch 00000 | Average Loss 1.9491 | Accuracy 0.5924 | Time 24.7810
294it [00:21, 13.65it/s]
Validating...
31it [00:02, 14.54it/s]
Epoch 00001 | Average Loss 1.3033 | Accuracy 0.6245 | Time 23.8770
294it [00:21, 13.64it/s]
Validating...
31it [00:02, 14.58it/s]
Epoch 00002 | Average Loss 1.2215 | Accuracy 0.6469 | Time 23.8830
294it [00:21, 13.65it/s]
Validating...
31it [00:02, 14.56it/s]
Epoch 00003 | Average Loss 1.1796 | Accuracy 0.6448 | Time 23.8804
294it [00:21, 13.65it/s]
Validating...
31it [00:02, 14.58it/s]
Epoch 00004 | Average Loss 1.1523 | Accuracy 0.6533 | Time 23.8787
294it [00:21, 13.58it/s]
Validating...
31it [00:02, 14.54it/s]
Epoch 00005 | Average Loss 1.1338 | Accuracy 0.6464 | Time 23.9888
294it [00:21, 13.64it/s]
Validating...
31it [00:02, 14.55it/s]
Epoch 00006 | Average Loss 1.1200 | Accuracy 0.6503 | Time 23.8843
294it [00:21, 13.64it/s]
Validating...
31it [00:02, 14.52it/s]
Epoch 00007 | Average Loss 1.1080 | Accuracy 0.6569 | Time 23.8870
294it [00:21, 13.64it/s]
Validating...
31it [00:02, 14.53it/s]
Epoch 00008 | Average Loss 1.0979 | Accuracy 0.6615 | Time 23.8950
294it [00:21, 13.65it/s]
Validating...
31it [00:02, 14.53it/s]
Epoch 00009 | Average Loss 1.0894 | Accuracy 0.6603 | Time 23.8899
Testing...
53it [00:03, 14.50it/s]
Test Accuracy 0.6318

New:

$ python /home/ubuntu/dgl/examples/multigpu/graphbolt/node_classification.py --gpu 0,1,2,3 --dataset ogbn-papers100M
Training with 4 gpus.
The dataset is already preprocessed.
Training...
294it [00:21, 13.69it/s]
Validating...
31it [00:02, 14.19it/s]
Epoch 00000 | Average Loss 1.9418 | Accuracy 0.5957 | Time 23.8790
294it [00:20, 14.18it/s]
Validating...
31it [00:02, 14.65it/s]
Epoch 00001 | Average Loss 1.3039 | Accuracy 0.6233 | Time 23.0518
294it [00:20, 14.19it/s]
Validating...
31it [00:02, 14.57it/s]
Epoch 00002 | Average Loss 1.2206 | Accuracy 0.6458 | Time 23.0501
294it [00:20, 14.18it/s]
Validating...
31it [00:02, 14.62it/s]
Epoch 00003 | Average Loss 1.1800 | Accuracy 0.6493 | Time 23.0555
294it [00:20, 14.17it/s]
Validating...
31it [00:02, 14.54it/s]
Epoch 00004 | Average Loss 1.1533 | Accuracy 0.6571 | Time 23.0787
294it [00:20, 14.11it/s]
Validating...
31it [00:02, 14.58it/s]
Epoch 00005 | Average Loss 1.1354 | Accuracy 0.6563 | Time 23.1551
294it [00:20, 14.19it/s]
Validating...
31it [00:02, 14.56it/s]
Epoch 00006 | Average Loss 1.1197 | Accuracy 0.6585 | Time 23.0504
294it [00:20, 14.18it/s]
Validating...
31it [00:02, 14.57it/s]
Epoch 00007 | Average Loss 1.1088 | Accuracy 0.6571 | Time 23.0587
294it [00:20, 14.21it/s]
Validating...
31it [00:02, 14.53it/s]
Epoch 00008 | Average Loss 1.0991 | Accuracy 0.6616 | Time 23.0182
294it [00:20, 14.20it/s]
Validating...
31it [00:02, 14.57it/s]
Epoch 00009 | Average Loss 1.0909 | Accuracy 0.6632 | Time 23.0365
Testing...
53it [00:03, 14.53it/s]
Test Accuracy 0.6337

Skeleton003 · 2024-05-06T11:48:07Z

Tested on g4dn.metal.

dgl-bot · 2024-05-06T21:33:21Z

Commit ID: 0091ccae666bf1915f2022dfd420afd049186a5e

Build ID: 7

Status: ⚪️ CI test cancelled due to overrun.

Report path: link

Full logs path: link

dgl-bot · 2024-05-06T21:34:32Z

Commit ID: 0b00f16c08a5242b4a592cf9565b21eb69e80eb0

Build ID: 8

Status: ⚪️ CI test cancelled due to overrun.

Report path: link

Full logs path: link

Skeleton003 · 2024-05-06T21:38:02Z

@Rhett-Ying The issue of random seed has been resolved. What a relief that torch.distributed has convenient communicating APIs.

python/dgl/graphbolt/item_sampler.py

dgl-bot · 2024-05-06T22:31:41Z

Commit ID: 3ca84f39e44065cc93c484672d8639ddd152bf09

Build ID: 9

Status: ✅ CI test succeeded.

Report path: link

Full logs path: link

dgl-bot · 2024-05-07T06:19:21Z

Commit ID: 08ac1ebbba8514a5eeea4ffcbeac85204335468d

Build ID: 10

Status: ✅ CI test succeeded.

Report path: link

Full logs path: link

Rhett-Ying

This POC proves to work well both on correctness and performance. Now it's time to finalize the code change.

Is it possible to update existing ItemSampler instead of creating a new class? Seems the major part is fixing the seed?
is it possible to split the change on ItemSampler and ItemSet/Dict to make the change as small as possible for quick review?

Rhett-Ying · 2024-05-07T08:36:32Z

examples/multigpu/graphbolt/node_classification.py

@@ -36,6 +36,7 @@
      │
      └───> Test set evaluation
 """
+


does it work well with --num-workers 2 for multiple GPUs?

Both old and new Implementation encounter the same error with --num-workers 2

Traceback (most recent call last): File "<string>", line 1, in <module> File "/home/ubuntu/miniconda3/envs/dgl/lib/python3.9/multiprocessing/spawn.py", line 116, in spawn_main exitcode = _main(fd, parent_sentinel) File "/home/ubuntu/miniconda3/envs/dgl/lib/python3.9/multiprocessing/spawn.py", line 126, in _main self = reduction.pickle.load(from_parent) File "/home/ubuntu/miniconda3/envs/dgl/lib/python3.9/site-packages/torch/utils/data/datapipes/datapipe.py", line 359, in __setstate__ self._datapipe = dill.loads(value) File "/home/ubuntu/miniconda3/envs/dgl/lib/python3.9/site-packages/dill/_dill.py", line 303, in loads return load(file, ignore, **kwds) File "/home/ubuntu/miniconda3/envs/dgl/lib/python3.9/site-packages/dill/_dill.py", line 289, in load return Unpickler(file, ignore=ignore, **kwds).load() File "/home/ubuntu/miniconda3/envs/dgl/lib/python3.9/site-packages/dill/_dill.py", line 444, in load obj = StockUnpickler.load(self) AttributeError: 'PyCapsule' object has no attribute 'cudaHostUnregister'

Is this a long-standing problem? Or is there something wrong with my package version?

I'm afraid no one run the multi-gpu example with multiple num_workers before. Please file an issue and look into it.

Skeleton003 · 2024-05-07T14:28:57Z

This POC proves to work well both on correctness and performance. Now it's time to finalize the code change.

Is it possible to update existing ItemSampler instead of creating a new class? Seems the major part is fixing the seed?

is it possible to split the change on ItemSampler and ItemSet/Dict to make the change as small as possible for quick review?

I'm afraid the change on ItemSet/Dict cannot be separated because the new ItemSampler takes it as input. We have to modify them simultaneously. For the sake of code review, I think we can devide this PR into 2. The first adds ItemSet/Dict4 but remain the old ItemSetDict unchanged, the second updates the existing ItemSampler and replaces the old ItemSetDict with the new. If this is what you envision, I can get started on it right away.

Rhett-Ying · 2024-05-08T05:36:49Z

This POC proves to work well both on correctness and performance. Now it's time to finalize the code change.

Is it possible to update existing ItemSampler instead of creating a new class? Seems the major part is fixing the seed?

is it possible to split the change on ItemSampler and ItemSet/Dict to make the change as small as possible for quick review?

I'm afraid the change on ItemSet/Dict cannot be separated because the new ItemSampler takes it as input. We have to modify them simultaneously. For the sake of code review, I think we can devide this PR into 2. The first adds ItemSet/Dict4 but remain the old ItemSetDict unchanged, the second updates the existing ItemSampler and replaces the old ItemSetDict with the new. If this is what you envision, I can get started on it right away.

Sounds good to me.

1

78b9e90

Skeleton003 added 5 commits April 29, 2024 03:14

docstring

598ce67

shuffle

6354418

all

eb7241f

self minibathcer

c55f581

_

5a5c786

add test

6442d64

lint

be7d2e0

Skeleton003 requested a review from Rhett-Ying May 6, 2024 04:30

Ubuntu added 2 commits May 6, 2024 21:29

pass seed between groups

486d04f

rm import backend

740138e

lint

004fde7

mfbalin reviewed May 6, 2024

View reviewed changes

python/dgl/graphbolt/item_sampler.py Outdated Show resolved Hide resolved

rm barrier

9e11f14

Rhett-Ying reviewed May 7, 2024

View reviewed changes

Rhett-Ying added the pr: Suspended PR status label May 9, 2024

Skeleton003 mentioned this pull request May 10, 2024

[GraphBolt] Modify ItemSet/Dict requiring items to be tensors #7394

Merged

8 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[GraphBolt] Add experimental `ItemSet/Dict4` and `ItemSampler4` #7371

[GraphBolt] Add experimental `ItemSet/Dict4` and `ItemSampler4` #7371

Skeleton003 commented Apr 29, 2024 •

edited

dgl-bot commented Apr 29, 2024

dgl-bot commented Apr 29, 2024

dgl-bot commented Apr 29, 2024

dgl-bot commented Apr 29, 2024

dgl-bot commented May 5, 2024

dgl-bot commented May 5, 2024

dgl-bot commented May 6, 2024

Skeleton003 commented May 6, 2024

Rhett-Ying commented May 6, 2024 •

edited

Skeleton003 commented May 6, 2024

Skeleton003 commented May 6, 2024

dgl-bot commented May 6, 2024

dgl-bot commented May 6, 2024

Skeleton003 commented May 6, 2024

dgl-bot commented May 6, 2024

dgl-bot commented May 7, 2024

Rhett-Ying left a comment

Rhett-Ying May 7, 2024

Skeleton003 May 7, 2024

Rhett-Ying May 8, 2024

Skeleton003 commented May 7, 2024

Rhett-Ying commented May 8, 2024

[GraphBolt] Add experimental ItemSet/Dict4 and ItemSampler4 #7371

Are you sure you want to change the base?

[GraphBolt] Add experimental ItemSet/Dict4 and ItemSampler4 #7371

Conversation

Skeleton003 commented Apr 29, 2024 • edited

Description

Checklist

Changes

dgl-bot commented Apr 29, 2024

dgl-bot commented Apr 29, 2024

dgl-bot commented Apr 29, 2024

dgl-bot commented Apr 29, 2024

dgl-bot commented May 5, 2024

dgl-bot commented May 5, 2024

dgl-bot commented May 6, 2024

Skeleton003 commented May 6, 2024

Rhett-Ying commented May 6, 2024 • edited

Skeleton003 commented May 6, 2024

ogbn-products

ogbn-arxiv

ogbn-papers100M

Skeleton003 commented May 6, 2024

dgl-bot commented May 6, 2024

dgl-bot commented May 6, 2024

Skeleton003 commented May 6, 2024

dgl-bot commented May 6, 2024

dgl-bot commented May 7, 2024

Rhett-Ying left a comment

Choose a reason for hiding this comment

Rhett-Ying May 7, 2024

Choose a reason for hiding this comment

Skeleton003 May 7, 2024

Choose a reason for hiding this comment

Rhett-Ying May 8, 2024

Choose a reason for hiding this comment

Skeleton003 commented May 7, 2024

Rhett-Ying commented May 8, 2024

[GraphBolt] Add experimental `ItemSet/Dict4` and `ItemSampler4` #7371

[GraphBolt] Add experimental `ItemSet/Dict4` and `ItemSampler4` #7371

Skeleton003 commented Apr 29, 2024 •

edited

Rhett-Ying commented May 6, 2024 •

edited