Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

About the experiment in GPT-GNN #19

Open
Mobzhang opened this issue Oct 28, 2020 · 4 comments
Open

About the experiment in GPT-GNN #19

Mobzhang opened this issue Oct 28, 2020 · 4 comments

Comments

@Mobzhang
Copy link

Hi authors
Thanks for your amazing work in Pre-train GNN, It can solve larger datasizes Graph in GNN model. but I have some questions when I try to understand the part of experiment. I saw your code in github, and I noticed that it includes pre-train and fine-tune. And I have a question about experiment in your paper.

  1. How do you conduct the experiment in GraphSAGE and GAE? If I split data into 0-70% Pre-train, 70%-80% train, 80%-90%valid, 90%-100%test, I should put 0-80% data as train data into GraphSAGE and GAE ?
    Looking forward to your reply, thank you!
@Mobzhang
Copy link
Author

And if I use no_pretrain, I saw your code that is only consider train_data, valid_data, test_data in finetune_reddit.py. Am I right?

@acbull
Copy link
Owner

acbull commented Oct 28, 2020

For all of the pre-training baselines (including GAE, GraphSAGE-unsuper, and our method), the setting all follows the pretrain-finetune paradigm, which means we first pre-train the model using the self-supervised task on the pre-training dataset (in your example, 0-70%), then use the pre-trained model to finetune on the training set (70-80%), with model selection using the valid set, and get generalization performance on the test set.

Yes, for no_pretrain, we don't leverage the pre-training data.

@Mobzhang
Copy link
Author

Very thanks for your reply! I'm still a little confused about GraphSAGE and GAE. Did you put all pre_train data into model or sample data just like your paper described?

@acbull
Copy link
Owner

acbull commented Oct 29, 2020

Pre-training will use all the pre_train data, but we conduct mini-batch training by subgraph sampling to avoid memory issue (as the whole graph is too big for the GPU memory)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants