About the training time #33

SpaceLearner · 2021-07-02T04:26:39Z

Hi, thank you for your excellent job. However, when I run the code using the default setting for pretraining on OAG_CS, the time taken for an epoch is much longer than you reported. It takes around 40 minutes for an epoch and 40 * 400 / 60 = 266.7 hours for 400 epochs, which is much longer than 12 hours in the paper. My machine is Tesla P100 and 8 * Xeon(R) CPU E5-2690 v4. How can I solve this problem?

The following is the log

+--------------------+-------------------------------------------+
| Parameter | Value |
+--------------------+-------------------------------------------+
| attr_ratio | 0.500 |
+--------------------+-------------------------------------------+
| attr_type | text |
+--------------------+-------------------------------------------+
| neg_samp_num | 255 |
+--------------------+-------------------------------------------+
| queue_size | 256 |
+--------------------+-------------------------------------------+
| w2v_dir | /data/data0/gjy/dataset/OAG/w2v_all |
+--------------------+-------------------------------------------+
| data_dir | /data/data0/gjy/dataset/OAG/graph_CS.pk |
+--------------------+-------------------------------------------+
| pretrain_model_dir | /data/data0/gjy/GPT-GNN/saved/OAG/gnn.pkl |
+--------------------+-------------------------------------------+
| cuda | 7 |
+--------------------+-------------------------------------------+
| sample_depth | 3 |
+--------------------+-------------------------------------------+
| sample_width | 128 |
+--------------------+-------------------------------------------+
| conv_name | hgt |
+--------------------+-------------------------------------------+
| n_hid | 400 |
+--------------------+-------------------------------------------+
| n_heads | 8 |
+--------------------+-------------------------------------------+
| n_layers | 3 |
+--------------------+-------------------------------------------+
| prev_norm | 1 |
+--------------------+-------------------------------------------+
| last_norm | 1 |
+--------------------+-------------------------------------------+
| dropout | 0.200 |
+--------------------+-------------------------------------------+
| max_lr | 0.001 |
+--------------------+-------------------------------------------+
| scheduler | cycle |
+--------------------+-------------------------------------------+
| n_epoch | 200 |
+--------------------+-------------------------------------------+
| n_pool | 8 |
+--------------------+-------------------------------------------+
| n_batch | 32 |
+--------------------+-------------------------------------------+
| batch_size | 256 |
+--------------------+-------------------------------------------+
| clip | 0.500 |
+--------------------+-------------------------------------------+
cuda:7
Start Loading Graph Data...
Finish Loading Graph Data!
paper PP_cite
paper rev_PP_cite
venue rev_PV_Conference
venue rev_PV_Journal
field rev_PF_in_L3
field rev_PF_in_L1
field rev_PF_in_L2
field rev_PF_in_L4
author AP_write_last
author AP_write_other
author AP_write_first
Start Pretraining...
Data Preparation: 68.7s
Epoch: 1, (1 / 41) 45.3s LR: 0.00005 Train Loss: (4.773, 9.771) Valid Loss: (4.762, 8.815) NDCG: 0.314 Norm: 20.012 queue: 1
UPDATE!!!
Data Preparation: 57.1s
Epoch: 1, (2 / 41) 40.3s LR: 0.00005 Train Loss: (4.594, 8.514) Valid Loss: (4.532, 7.968) NDCG: 0.353 Norm: 20.025 queue: 1
UPDATE!!!
Data Preparation: 29.7s
Epoch: 1, (3 / 41) 38.4s LR: 0.00006 Train Loss: (4.469, 7.768) Valid Loss: (4.628, 7.167) NDCG: 0.359 Norm: 20.035 queue: 1
UPDATE!!!
Data Preparation: 17.0s
Epoch: 1, (4 / 41) 36.8s LR: 0.00006 Train Loss: (4.426, 7.283) Valid Loss: (4.453, 6.991) NDCG: 0.367 Norm: 20.043 queue: 1
UPDATE!!!
Data Preparation: 13.0s
Epoch: 1, (5 / 41) 36.8s LR: 0.00007 Train Loss: (4.375, 7.060) Valid Loss: (4.509, 6.793) NDCG: 0.365 Norm: 20.047 queue: 1
UPDATE!!!
Data Preparation: 12.3s

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

About the training time #33

About the training time #33

SpaceLearner commented Jul 2, 2021

About the training time #33

About the training time #33

Comments

SpaceLearner commented Jul 2, 2021