Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

About the training time #33

Open
SpaceLearner opened this issue Jul 2, 2021 · 0 comments
Open

About the training time #33

SpaceLearner opened this issue Jul 2, 2021 · 0 comments

Comments

@SpaceLearner
Copy link

Hi, thank you for your excellent job. However, when I run the code using the default setting for pretraining on OAG_CS, the time taken for an epoch is much longer than you reported. It takes around 40 minutes for an epoch and 40 * 400 / 60 = 266.7 hours for 400 epochs, which is much longer than 12 hours in the paper. My machine is Tesla P100 and 8 * Xeon(R) CPU E5-2690 v4. How can I solve this problem?

The following is the log

+--------------------+-------------------------------------------+
| Parameter | Value |
+--------------------+-------------------------------------------+
| attr_ratio | 0.500 |
+--------------------+-------------------------------------------+
| attr_type | text |
+--------------------+-------------------------------------------+
| neg_samp_num | 255 |
+--------------------+-------------------------------------------+
| queue_size | 256 |
+--------------------+-------------------------------------------+
| w2v_dir | /data/data0/gjy/dataset/OAG/w2v_all |
+--------------------+-------------------------------------------+
| data_dir | /data/data0/gjy/dataset/OAG/graph_CS.pk |
+--------------------+-------------------------------------------+
| pretrain_model_dir | /data/data0/gjy/GPT-GNN/saved/OAG/gnn.pkl |
+--------------------+-------------------------------------------+
| cuda | 7 |
+--------------------+-------------------------------------------+
| sample_depth | 3 |
+--------------------+-------------------------------------------+
| sample_width | 128 |
+--------------------+-------------------------------------------+
| conv_name | hgt |
+--------------------+-------------------------------------------+
| n_hid | 400 |
+--------------------+-------------------------------------------+
| n_heads | 8 |
+--------------------+-------------------------------------------+
| n_layers | 3 |
+--------------------+-------------------------------------------+
| prev_norm | 1 |
+--------------------+-------------------------------------------+
| last_norm | 1 |
+--------------------+-------------------------------------------+
| dropout | 0.200 |
+--------------------+-------------------------------------------+
| max_lr | 0.001 |
+--------------------+-------------------------------------------+
| scheduler | cycle |
+--------------------+-------------------------------------------+
| n_epoch | 200 |
+--------------------+-------------------------------------------+
| n_pool | 8 |
+--------------------+-------------------------------------------+
| n_batch | 32 |
+--------------------+-------------------------------------------+
| batch_size | 256 |
+--------------------+-------------------------------------------+
| clip | 0.500 |
+--------------------+-------------------------------------------+
cuda:7
Start Loading Graph Data...
Finish Loading Graph Data!
paper PP_cite
paper rev_PP_cite
venue rev_PV_Conference
venue rev_PV_Journal
field rev_PF_in_L3
field rev_PF_in_L1
field rev_PF_in_L2
field rev_PF_in_L4
author AP_write_last
author AP_write_other
author AP_write_first
Start Pretraining...
Data Preparation: 68.7s
Epoch: 1, (1 / 41) 45.3s LR: 0.00005 Train Loss: (4.773, 9.771) Valid Loss: (4.762, 8.815) NDCG: 0.314 Norm: 20.012 queue: 1
UPDATE!!!
Data Preparation: 57.1s
Epoch: 1, (2 / 41) 40.3s LR: 0.00005 Train Loss: (4.594, 8.514) Valid Loss: (4.532, 7.968) NDCG: 0.353 Norm: 20.025 queue: 1
UPDATE!!!
Data Preparation: 29.7s
Epoch: 1, (3 / 41) 38.4s LR: 0.00006 Train Loss: (4.469, 7.768) Valid Loss: (4.628, 7.167) NDCG: 0.359 Norm: 20.035 queue: 1
UPDATE!!!
Data Preparation: 17.0s
Epoch: 1, (4 / 41) 36.8s LR: 0.00006 Train Loss: (4.426, 7.283) Valid Loss: (4.453, 6.991) NDCG: 0.367 Norm: 20.043 queue: 1
UPDATE!!!
Data Preparation: 13.0s
Epoch: 1, (5 / 41) 36.8s LR: 0.00007 Train Loss: (4.375, 7.060) Valid Loss: (4.509, 6.793) NDCG: 0.365 Norm: 20.047 queue: 1
UPDATE!!!
Data Preparation: 12.3s

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant