New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ARGO: an easy-to-use runtime to improve GNN training performance on multi-core processors #8841
base: master
Are you sure you want to change the base?
Conversation
for more information, see https://pre-commit.ci
for more information, see https://pre-commit.ci
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is great thanks for adding!
Need your help with a few things to merge this PR
- Add a link to the paper and also give a short explanation of how ARGO speeds things up.
- Could you share some benchmarks on pyg models. Showing how much ARGO speeds up py models on high core cpus?
- Add type hints to all functions and also a short doc for each function.
examples/argo.py
Outdated
self.acq_func = 'EI' # acqusition function of the auto-tuner | ||
self.counter = [0] | ||
|
||
def core_binder(self, num_cpu_proc, n_samp, n_train, rank): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Add type hints.
examples/argo.py
Outdated
acq_func=self.acq_func) | ||
return result | ||
|
||
def mp_engine(self, x, train, args, ep): # Multi-Process Engine |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Type hints here too.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for your suggestions!
I added type hints to each function and provided a short doc.
The description of ARGO is available on top of the file now.
For benchmarking results, they are available in the paper (Fig. 10 and 11).
Please let me know if you need anything else from me :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks interesting! Do you think we could have an end-to-end example?
examples/argo.py
Outdated
Parameters | ||
---------- |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we follow the same docstring style as the PyG codebase?
Parameters | |
---------- | |
Args: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm...I can change the term to "Args." But I think the PyG doc uses "PARAMETERS"?
For example, the doc here: https://pytorch-geometric.readthedocs.io/en/latest/modules/nn.html
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The built docs show "PARAMETERS", but in the codebase, we follow this docstring style afaik: https://google.github.io/styleguide/pyguide.html#38-comments-and-docstrings Please feel free to confirm how the docstrings are written in the codebase.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Understood. I have updated the document accordingly.
examples/argo.py
Outdated
random_state: int | ||
Number of random initializations before searching | ||
|
||
acq_function: str |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks like acq_function
isn't used. Mind updating the docsting or the code?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure thing.
Yes. I do have end-to-end example: https://github.com/jasonlin316/ARGO/tree/main/PyG |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I feel like we should have a complete example because only looking at this class may not be clear enough for most PyG users to adapt it to their use cases IMHO.
Yes, I agree having a complete example is important for the users. |
examples/argo.py
Outdated
import time | ||
from typing import Callable, Tuple | ||
|
||
import dgl.multiprocessing as dmp |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this needed? We don't want to have a dgl
dependency in PyG. Can't we use Python/PyTorch for this?
@jasonlin316 I moved to |
Codecov ReportAttention:
Additional details and impacted files@@ Coverage Diff @@
## master #8841 +/- ##
==========================================
- Coverage 89.26% 89.01% -0.26%
==========================================
Files 468 469 +1
Lines 29960 30045 +85
==========================================
Hits 26744 26744
- Misses 3216 3301 +85 ☔ View full report in Codecov by Sentry. |
Overview
The GNN training performance on multi-core processors is limited as the current design cannot scale well. We propose a runtime system named ARGO that can improve the scalability of GNN training on multi-core processors. On a CPU platform where the original program can only scale to 16 cores (meaning that no performance improvement is achieved if more than 16 cores are applied), ARGO can further scale the design up to 64 cores, achieving up to 5x speedup compared to the original design without ARGO.