Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

running time on OAG_CS dataset #30

Open
yihong-chen opened this issue Apr 14, 2021 · 3 comments
Open

running time on OAG_CS dataset #30

yihong-chen opened this issue Apr 14, 2021 · 3 comments

Comments

@yihong-chen
Copy link

yihong-chen commented Apr 14, 2021

Hi, thanks for providing the awesome code of GPT-GNN.

I am trying to run your code on OAG_CS dataset but I am not sure if I get it right. In the paper, the reported pre-training time is about 10-12 hours for 400 epochs while it took much longer on my side. I wonder if you could specify the requirements of the computational resources. For example, how many cpus do I need for achieving a pre-training time of 10 hours? I attached the output for my run as follows.

+--------------------+-------------------------------+
| Parameter          | Value                         |
+--------------------+-------------------------------+
| attr_ratio         | 0.500                         |
+--------------------+-------------------------------+
| attr_type          | text                          |
+--------------------+-------------------------------+
| neg_samp_num       | 255                           |
+--------------------+-------------------------------+
| queue_size         | 256                           |
+--------------------+-------------------------------+
| w2v_dir            | ./data/oag_output/w2v_all     |
+--------------------+-------------------------------+
| data_dir           | ./data/oag_output/graph_CS.pk |
+--------------------+-------------------------------+
| pretrain_model_dir | ./tmp/model/gta_all_cs3       |
+--------------------+-------------------------------+
| cuda               | 0                             |
+--------------------+-------------------------------+
| sample_depth       | 6                             |
+--------------------+-------------------------------+
| sample_width       | 128                           |
+--------------------+-------------------------------+
| conv_name          | hgt                           |
+--------------------+-------------------------------+
| n_hid              | 400                           |
+--------------------+-------------------------------+
| n_heads            | 8                             |
+--------------------+-------------------------------+
| n_layers           | 3                             |
+--------------------+-------------------------------+
| prev_norm          | 0                             |
+--------------------+-------------------------------+
| last_norm          | 0                             |
+--------------------+-------------------------------+
| dropout            | 0.200                         |
+--------------------+-------------------------------+
| max_lr             | 0.001                         |
+--------------------+-------------------------------+
| scheduler          | cycle                         |
+--------------------+-------------------------------+
| n_epoch            | 20                            |
+--------------------+-------------------------------+
| n_pool             | 8                             |
+--------------------+-------------------------------+
| n_batch            | 32                            |
+--------------------+-------------------------------+
| batch_size         | 256                           |
+--------------------+-------------------------------+
| clip               | 0.500                         |
+--------------------+-------------------------------+
Start Loading Graph Data...
Finish Loading Graph Data!
paper PP_cite
paper rev_PP_cite
venue rev_PV_Conference
venue rev_PV_Journal
field rev_PF_in_L3
field rev_PF_in_L1
field rev_PF_in_L2
field rev_PF_in_L4
author AP_write_last
author AP_write_other
author AP_write_first
Start Pretraining...
Data Preparation: 80.1s
Epoch: 1, (1 / 41) 55.9s  LR: 0.00010 Train Loss: (5.129, 10.292)  Valid Loss: (5.082, 9.933)  NDCG: 0.306  Norm: 0.666  queue: 12
UPDATE!!!
Data Preparation: 23.3s
Epoch: 1, (2 / 41) 45.0s  LR: 0.00015 Train Loss: (4.877, 9.236)  Valid Loss: (4.861, 8.130)  NDCG: 0.320  Norm: 0.950  queue: 12
UPDATE!!!
Data Preparation: 34.6s
Epoch: 1, (3 / 41) 40.1s  LR: 0.00021 Train Loss: (4.776, 7.650)  Valid Loss: (4.899, 6.895)  NDCG: 0.327  Norm: 1.243  queue: 12
UPDATE!!!
Data Preparation: 37.3s
Epoch: 1, (4 / 41) 42.5s  LR: 0.00027 Train Loss: (4.716, 6.930)  Valid Loss: (4.697, 6.571)  NDCG: 0.334  Norm: 1.493  queue: 12
UPDATE!!!
Data Preparation: 33.9s
Epoch: 1, (5 / 41) 40.8s  LR: 0.00032 Train Loss: (4.635, 6.624)  Valid Loss: (4.614, 6.290)  NDCG: 0.341  Norm: 1.950  queue: 12
UPDATE!!!
Data Preparation: 38.9s
Epoch: 1, (6 / 41) 45.2s  LR: 0.00038 Train Loss: (4.572, 6.470)  Valid Loss: (4.568, 6.386)  NDCG: 0.357  Norm: 2.481  queue: 12
Data Preparation: 30.5s
Epoch: 1, (7 / 41) 42.7s  LR: 0.00044 Train Loss: (4.438, 6.391)  Valid Loss: (4.501, 6.224)  NDCG: 0.371  Norm: 2.532  queue: 12
UPDATE!!!
@acbull
Copy link
Owner

acbull commented Apr 14, 2021

Hi:

From your log, it seems the bottleneck is the sampling (which is conducted on CPU). My previous setting is 8* CPU E5-2698 v4 @ 2.20GHz. (But the machine also runs other experiments so it's just a reference)
My previous implementation of the sampling is not very efficient. I'll update them to make it more efficient later.

@yihong-chen
Copy link
Author

Thank you for the quick reply and looking forward to the efficient version :)

@SpaceLearner
Copy link

Hi, is there any progress about the efficient version?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants