running time on OAG_CS dataset #30

yihong-chen · 2021-04-14T14:45:23Z

Hi, thanks for providing the awesome code of GPT-GNN.

I am trying to run your code on OAG_CS dataset but I am not sure if I get it right. In the paper, the reported pre-training time is about 10-12 hours for 400 epochs while it took much longer on my side. I wonder if you could specify the requirements of the computational resources. For example, how many cpus do I need for achieving a pre-training time of 10 hours? I attached the output for my run as follows.

+--------------------+-------------------------------+
| Parameter          | Value                         |
+--------------------+-------------------------------+
| attr_ratio         | 0.500                         |
+--------------------+-------------------------------+
| attr_type          | text                          |
+--------------------+-------------------------------+
| neg_samp_num       | 255                           |
+--------------------+-------------------------------+
| queue_size         | 256                           |
+--------------------+-------------------------------+
| w2v_dir            | ./data/oag_output/w2v_all     |
+--------------------+-------------------------------+
| data_dir           | ./data/oag_output/graph_CS.pk |
+--------------------+-------------------------------+
| pretrain_model_dir | ./tmp/model/gta_all_cs3       |
+--------------------+-------------------------------+
| cuda               | 0                             |
+--------------------+-------------------------------+
| sample_depth       | 6                             |
+--------------------+-------------------------------+
| sample_width       | 128                           |
+--------------------+-------------------------------+
| conv_name          | hgt                           |
+--------------------+-------------------------------+
| n_hid              | 400                           |
+--------------------+-------------------------------+
| n_heads            | 8                             |
+--------------------+-------------------------------+
| n_layers           | 3                             |
+--------------------+-------------------------------+
| prev_norm          | 0                             |
+--------------------+-------------------------------+
| last_norm          | 0                             |
+--------------------+-------------------------------+
| dropout            | 0.200                         |
+--------------------+-------------------------------+
| max_lr             | 0.001                         |
+--------------------+-------------------------------+
| scheduler          | cycle                         |
+--------------------+-------------------------------+
| n_epoch            | 20                            |
+--------------------+-------------------------------+
| n_pool             | 8                             |
+--------------------+-------------------------------+
| n_batch            | 32                            |
+--------------------+-------------------------------+
| batch_size         | 256                           |
+--------------------+-------------------------------+
| clip               | 0.500                         |
+--------------------+-------------------------------+
Start Loading Graph Data...
Finish Loading Graph Data!
paper PP_cite
paper rev_PP_cite
venue rev_PV_Conference
venue rev_PV_Journal
field rev_PF_in_L3
field rev_PF_in_L1
field rev_PF_in_L2
field rev_PF_in_L4
author AP_write_last
author AP_write_other
author AP_write_first
Start Pretraining...
Data Preparation: 80.1s
Epoch: 1, (1 / 41) 55.9s  LR: 0.00010 Train Loss: (5.129, 10.292)  Valid Loss: (5.082, 9.933)  NDCG: 0.306  Norm: 0.666  queue: 12
UPDATE!!!
Data Preparation: 23.3s
Epoch: 1, (2 / 41) 45.0s  LR: 0.00015 Train Loss: (4.877, 9.236)  Valid Loss: (4.861, 8.130)  NDCG: 0.320  Norm: 0.950  queue: 12
UPDATE!!!
Data Preparation: 34.6s
Epoch: 1, (3 / 41) 40.1s  LR: 0.00021 Train Loss: (4.776, 7.650)  Valid Loss: (4.899, 6.895)  NDCG: 0.327  Norm: 1.243  queue: 12
UPDATE!!!
Data Preparation: 37.3s
Epoch: 1, (4 / 41) 42.5s  LR: 0.00027 Train Loss: (4.716, 6.930)  Valid Loss: (4.697, 6.571)  NDCG: 0.334  Norm: 1.493  queue: 12
UPDATE!!!
Data Preparation: 33.9s
Epoch: 1, (5 / 41) 40.8s  LR: 0.00032 Train Loss: (4.635, 6.624)  Valid Loss: (4.614, 6.290)  NDCG: 0.341  Norm: 1.950  queue: 12
UPDATE!!!
Data Preparation: 38.9s
Epoch: 1, (6 / 41) 45.2s  LR: 0.00038 Train Loss: (4.572, 6.470)  Valid Loss: (4.568, 6.386)  NDCG: 0.357  Norm: 2.481  queue: 12
Data Preparation: 30.5s
Epoch: 1, (7 / 41) 42.7s  LR: 0.00044 Train Loss: (4.438, 6.391)  Valid Loss: (4.501, 6.224)  NDCG: 0.371  Norm: 2.532  queue: 12
UPDATE!!!

The text was updated successfully, but these errors were encountered:

acbull · 2021-04-14T19:37:53Z

Hi:

From your log, it seems the bottleneck is the sampling (which is conducted on CPU). My previous setting is 8* CPU E5-2698 v4 @ 2.20GHz. (But the machine also runs other experiments so it's just a reference)
My previous implementation of the sampling is not very efficient. I'll update them to make it more efficient later.

yihong-chen · 2021-04-15T13:50:50Z

Thank you for the quick reply and looking forward to the efficient version :)

SpaceLearner · 2021-07-02T02:55:21Z

Hi, is there any progress about the efficient version?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

running time on OAG_CS dataset #30

running time on OAG_CS dataset #30

yihong-chen commented Apr 14, 2021 •

edited

acbull commented Apr 14, 2021

yihong-chen commented Apr 15, 2021

SpaceLearner commented Jul 2, 2021

running time on OAG_CS dataset #30

running time on OAG_CS dataset #30

Comments

yihong-chen commented Apr 14, 2021 • edited

acbull commented Apr 14, 2021

yihong-chen commented Apr 15, 2021

SpaceLearner commented Jul 2, 2021

yihong-chen commented Apr 14, 2021 •

edited