Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ARGO: an easy-to-use runtime to improve GNN training performance on multi-core processors #8841

Open
wants to merge 16 commits into
base: master
Choose a base branch
from

Conversation

jasonlin316
Copy link

Overview

The GNN training performance on multi-core processors is limited as the current design cannot scale well. We propose a runtime system named ARGO that can improve the scalability of GNN training on multi-core processors. On a CPU platform where the original program can only scale to 16 cores (meaning that no performance improvement is achieved if more than 16 cores are applied), ARGO can further scale the design up to 64 cores, achieving up to 5x speedup compared to the original design without ARGO.

@jasonlin316 jasonlin316 changed the title Add files via upload ARGO: an easy-to-use runtime to improve GNN training performance on multi-core processors Jan 31, 2024
Copy link
Member

@wsad1 wsad1 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is great thanks for adding!
Need your help with a few things to merge this PR

  1. Add a link to the paper and also give a short explanation of how ARGO speeds things up.
  2. Could you share some benchmarks on pyg models. Showing how much ARGO speeds up py models on high core cpus?
  3. Add type hints to all functions and also a short doc for each function.

examples/argo.py Outdated
self.acq_func = 'EI' # acqusition function of the auto-tuner
self.counter = [0]

def core_binder(self, num_cpu_proc, n_samp, n_train, rank):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add type hints.

examples/argo.py Outdated
acq_func=self.acq_func)
return result

def mp_engine(self, x, train, args, ep): # Multi-Process Engine
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Type hints here too.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for your suggestions!
I added type hints to each function and provided a short doc.
The description of ARGO is available on top of the file now.
For benchmarking results, they are available in the paper (Fig. 10 and 11).
Please let me know if you need anything else from me :)

Copy link
Member

@akihironitta akihironitta left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks interesting! Do you think we could have an end-to-end example?

examples/argo.py Outdated
Comment on lines 33 to 34
Parameters
----------
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we follow the same docstring style as the PyG codebase?

Suggested change
Parameters
----------
Args:

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm...I can change the term to "Args." But I think the PyG doc uses "PARAMETERS"?
For example, the doc here: https://pytorch-geometric.readthedocs.io/en/latest/modules/nn.html

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The built docs show "PARAMETERS", but in the codebase, we follow this docstring style afaik: https://google.github.io/styleguide/pyguide.html#38-comments-and-docstrings Please feel free to confirm how the docstrings are written in the codebase.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Understood. I have updated the document accordingly.

examples/argo.py Outdated
random_state: int
Number of random initializations before searching

acq_function: str
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like acq_function isn't used. Mind updating the docsting or the code?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure thing.

@jasonlin316
Copy link
Author

Looks interesting! Do you think we could have an end-to-end example?

Yes. I do have end-to-end example: https://github.com/jasonlin316/ARGO/tree/main/PyG
Please see flickr_example_ARGO.py which enables ARGO on the flickr_example.py.

Copy link
Member

@akihironitta akihironitta left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I feel like we should have a complete example because only looking at this class may not be clear enough for most PyG users to adapt it to their use cases IMHO.

@jasonlin316
Copy link
Author

I feel like we should have a complete example because only looking at this class may not be clear enough for most PyG users to adapt it to their use cases IMHO.

Yes, I agree having a complete example is important for the users.
@rusty1s Do you want to comment on this? I recall the plan is to add ARGO to the PyG package, and then add an example. Should we finalize this PR first, and then create another PR for the end-to-end example?

examples/argo.py Outdated
import time
from typing import Callable, Tuple

import dgl.multiprocessing as dmp
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this needed? We don't want to have a dgl dependency in PyG. Can't we use Python/PyTorch for this?

@github-actions github-actions bot added the nn label Feb 16, 2024
@rusty1s
Copy link
Member

rusty1s commented Feb 16, 2024

@jasonlin316 I moved to torch_geometric package directly, moved the doc-string and adjusted input arguments. If you have time, can you follow up with adjusting docs/arguments in the remainder? In addition, how hard would it be to test ARGO in test/nn/models/test_argo.py?

Copy link

codecov bot commented Feb 16, 2024

Codecov Report

Attention: 85 lines in your changes are missing coverage. Please review.

Comparison is base (1b195a0) 89.26% compared to head (15c2080) 89.01%.
Report is 2 commits behind head on master.

Files Patch % Lines
torch_geometric/nn/models/argo.py 0.00% 85 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##           master    #8841      +/-   ##
==========================================
- Coverage   89.26%   89.01%   -0.26%     
==========================================
  Files         468      469       +1     
  Lines       29960    30045      +85     
==========================================
  Hits        26744    26744              
- Misses       3216     3301      +85     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants