want to pretrain on my own datasets #21

Juicechen95 · 2020-11-11T10:02:18Z

Hi, acbull~ I think this algorithm is very interesting and I really want to test on my own graph dataset. It is there any advice or tips on how to prepare my own pretrain graph data? Thank you very much~~

acbull · 2020-11-11T23:44:02Z

Hi:

You can simply follow the similar paradigm of prepreocess_.py to parse your graph into our data formula, and then just run pretrain_.py over that parsed graph.

Or, if you want to merge our code into your own system, maybe you can rewrite the data structure, but everything else is similar.

Juicechen95 · 2020-11-12T02:24:51Z

Thank you very much for your advice, I will try it~

Juicechen95 · 2020-11-12T03:03:39Z

My own dataset contains more than 10 million nodes. I see in the paper that OAG dataset contains more than 178 million nodes, but I just find out that only about 1 million nodes are used for pretraining according to the pretrain_OAG.py, is that number right?
I don't know why only small part of OAG are used for pretrain. And I really wonder how large a dataset this method can be used in because my dataset is very large. Is that realistic? Will I meet some internal storage problems or we can not sample out the small graph? Or do you have any idea about how to do pretraining on a very large dataset?
I will be very grateful for your answer~~ Thanks a lot!!!

acbull · 2020-11-12T20:34:39Z

Since we utilize subgraph sampling during training, the size of the pretraining graph is not that matter. In experiments, I also try on the whole OAG dataset, but it's too big so I didn't provide it in google drive. But obviously, you can use our code to do pretraining on a super-large dataset.

Juicechen95 · 2020-11-16T03:40:32Z

Thank you for your patient reply~I will try it.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

want to pretrain on my own datasets #21

want to pretrain on my own datasets #21

Juicechen95 commented Nov 11, 2020

acbull commented Nov 11, 2020

Juicechen95 commented Nov 12, 2020

Juicechen95 commented Nov 12, 2020

acbull commented Nov 12, 2020

Juicechen95 commented Nov 16, 2020

want to pretrain on my own datasets #21

want to pretrain on my own datasets #21

Comments

Juicechen95 commented Nov 11, 2020

acbull commented Nov 11, 2020

Juicechen95 commented Nov 12, 2020

Juicechen95 commented Nov 12, 2020

acbull commented Nov 12, 2020

Juicechen95 commented Nov 16, 2020