ARGO: an easy-to-use runtime to improve GNN training performance on multi-core processors #8841

jasonlin316 · 2024-01-31T02:31:54Z

Overview

The GNN training performance on multi-core processors is limited as the current design cannot scale well. We propose a runtime system named ARGO that can improve the scalability of GNN training on multi-core processors. On a CPU platform where the original program can only scale to 16 cores (meaning that no performance improvement is achieved if more than 16 cores are applied), ARGO can further scale the design up to 64 cores, achieving up to 5x speedup compared to the original design without ARGO.

for more information, see https://pre-commit.ci

wsad1

This is great thanks for adding!
Need your help with a few things to merge this PR

Add a link to the paper and also give a short explanation of how ARGO speeds things up.
Could you share some benchmarks on pyg models. Showing how much ARGO speeds up py models on high core cpus?
Add type hints to all functions and also a short doc for each function.

wsad1 · 2024-02-09T07:26:43Z

examples/argo.py

+        self.acq_func = 'EI' # acqusition function of the auto-tuner
+        self.counter = [0]
+
+    def core_binder(self, num_cpu_proc, n_samp, n_train, rank):


Add type hints.

wsad1 · 2024-02-09T07:26:52Z

examples/argo.py

+                             acq_func=self.acq_func)
+        return result
+
+    def mp_engine(self, x, train, args, ep):  # Multi-Process Engine


Type hints here too.

Thanks for your suggestions!
I added type hints to each function and provided a short doc.
The description of ARGO is available on top of the file now.
For benchmarking results, they are available in the paper (Fig. 10 and 11).
Please let me know if you need anything else from me :)

akihironitta

Looks interesting! Do you think we could have an end-to-end example?

akihironitta · 2024-02-10T23:12:12Z

examples/argo.py

+        Parameters
+        ----------


Can we follow the same docstring style as the PyG codebase?

Suggested change

Parameters

----------

Args:

Hmm...I can change the term to "Args." But I think the PyG doc uses "PARAMETERS"?
For example, the doc here: https://pytorch-geometric.readthedocs.io/en/latest/modules/nn.html

The built docs show "PARAMETERS", but in the codebase, we follow this docstring style afaik: https://google.github.io/styleguide/pyguide.html#38-comments-and-docstrings Please feel free to confirm how the docstrings are written in the codebase.

Understood. I have updated the document accordingly.

akihironitta · 2024-02-10T23:13:00Z

examples/argo.py

+        random_state: int
+            Number of random initializations before searching
+
+        acq_function: str


Looks like acq_function isn't used. Mind updating the docsting or the code?

Sure thing.

jasonlin316 · 2024-02-11T05:51:45Z

Looks interesting! Do you think we could have an end-to-end example?

Yes. I do have end-to-end example: https://github.com/jasonlin316/ARGO/tree/main/PyG
Please see flickr_example_ARGO.py which enables ARGO on the flickr_example.py.

akihironitta

I feel like we should have a complete example because only looking at this class may not be clear enough for most PyG users to adapt it to their use cases IMHO.

jasonlin316 · 2024-02-13T18:37:42Z

I feel like we should have a complete example because only looking at this class may not be clear enough for most PyG users to adapt it to their use cases IMHO.

Yes, I agree having a complete example is important for the users.
@rusty1s Do you want to comment on this? I recall the plan is to add ARGO to the PyG package, and then add an example. Should we finalize this PR first, and then create another PR for the end-to-end example?

for more information, see https://pre-commit.ci

rusty1s · 2024-02-16T10:40:21Z

examples/argo.py

+import time
+from typing import Callable, Tuple
+
+import dgl.multiprocessing as dmp


Is this needed? We don't want to have a dgl dependency in PyG. Can't we use Python/PyTorch for this?

rusty1s · 2024-02-16T10:43:29Z

@jasonlin316 I moved to torch_geometric package directly, moved the doc-string and adjusted input arguments. If you have time, can you follow up with adjusting docs/arguments in the remainder? In addition, how hard would it be to test ARGO in test/nn/models/test_argo.py?

codecov · 2024-02-16T10:49:15Z

Codecov Report

Attention: 85 lines in your changes are missing coverage. Please review.

Comparison is base (1b195a0) 89.26% compared to head (15c2080) 89.01%.
Report is 2 commits behind head on master.

Files	Patch %	Lines
torch_geometric/nn/models/argo.py	0.00%	85 Missing ⚠️

Additional details and impacted files

@@            Coverage Diff             @@
##           master    #8841      +/-   ##
==========================================
- Coverage   89.26%   89.01%   -0.26%     
==========================================
  Files         468      469       +1     
  Lines       29960    30045      +85     
==========================================
  Hits        26744    26744              
- Misses       3216     3301      +85

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

Add files via upload

b1f74e8

github-actions bot added the example label Jan 31, 2024

[pre-commit.ci] auto fixes from pre-commit.com hooks

2123626

for more information, see https://pre-commit.ci

jasonlin316 changed the title ~~Add files via upload~~ ARGO: an easy-to-use runtime to improve GNN training performance on multi-core processors Jan 31, 2024

jasonlin316 and others added 3 commits January 30, 2024 18:38

Update argo.py

8e91078

Update CHANGELOG.md

62e39e2

[pre-commit.ci] auto fixes from pre-commit.com hooks

40304fb

for more information, see https://pre-commit.ci

rusty1s assigned jasonlin316 Jan 31, 2024

rusty1s added feature 0 - Priority P0 labels Jan 31, 2024

jasonlin316 marked this pull request as ready for review January 31, 2024 20:32

jasonlin316 requested a review from wsad1 as a code owner January 31, 2024 20:32

Update argo.py

853d777

wsad1 reviewed Feb 9, 2024

View reviewed changes

jasonlin316 added 3 commits February 9, 2024 16:42

Update argo.py

ca77bc5

update doc

8fce3d8

Update argo.py

328a698

akihironitta reviewed Feb 10, 2024

View reviewed changes

bug fix and update doc

de43f99

akihironitta reviewed Feb 13, 2024

View reviewed changes

Update doc

00ff9a1

rusty1s and others added 3 commits February 16, 2024 11:26

Merge branch 'master' into master

11f3735

[pre-commit.ci] auto fixes from pre-commit.com hooks

ceaf73a

for more information, see https://pre-commit.ci

update

7957f21

rusty1s reviewed Feb 16, 2024

View reviewed changes

update

c45466e

jasonlin316 requested a review from EdisonLeeeee as a code owner February 16, 2024 10:41

github-actions bot removed the example label Feb 16, 2024

github-actions bot added the nn label Feb 16, 2024

update

15c2080

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ARGO: an easy-to-use runtime to improve GNN training performance on multi-core processors #8841

ARGO: an easy-to-use runtime to improve GNN training performance on multi-core processors #8841

jasonlin316 commented Jan 31, 2024

wsad1 left a comment

wsad1 Feb 9, 2024

wsad1 Feb 9, 2024

jasonlin316 Feb 10, 2024

akihironitta left a comment

akihironitta Feb 10, 2024

jasonlin316 Feb 11, 2024

akihironitta Feb 13, 2024 •

edited

jasonlin316 Feb 13, 2024

akihironitta Feb 10, 2024

jasonlin316 Feb 11, 2024

jasonlin316 commented Feb 11, 2024

akihironitta left a comment

jasonlin316 commented Feb 13, 2024

rusty1s Feb 16, 2024

rusty1s commented Feb 16, 2024

codecov bot commented Feb 16, 2024

ARGO: an easy-to-use runtime to improve GNN training performance on multi-core processors #8841

Are you sure you want to change the base?

ARGO: an easy-to-use runtime to improve GNN training performance on multi-core processors #8841

Conversation

jasonlin316 commented Jan 31, 2024

Overview

wsad1 left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

akihironitta left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

akihironitta Feb 13, 2024 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jasonlin316 commented Feb 11, 2024

akihironitta left a comment

Choose a reason for hiding this comment

jasonlin316 commented Feb 13, 2024

Choose a reason for hiding this comment

rusty1s commented Feb 16, 2024

codecov bot commented Feb 16, 2024

Codecov Report

akihironitta Feb 13, 2024 •

edited