-
Notifications
You must be signed in to change notification settings - Fork 51
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Calculating embeddings for new nodes after training #62
Comments
Dear @judas123 Hope this helps! |
@barbara3430 Thanks for the detailed answer.
Thanks |
I am trying to run Cleora on a simple dataset. My TSV file is simple and follows the format of "leads attributes"
l1 <\t> a1
l2 <\t> a1
l1<\t> a2
l3 <\t> a2
Leads are connected to some attributes.
I have Set A which is used to train embeddings for all nodes ( leads and attributes ) in the set.
For new nodes with the same format of "leads attributes" in Set B, I calculate embeddings by using the following 2 methods. Then I use the embeddings for all "leads" nodes of Set A to train XGBoost model and predict on "leads" nodes of Set B to calculate the AUC.
Method 1
I jointly train embeddings by combining Set A and Set B. I get the embeddings for all "leads" nodes. On Set B, the XGBoost model AUC (trained on "leads" embeddings of Set A) is ~0.8
Method 2
I used another method as suggested in another closed issue #21 - where I train the embeddings only on Set A. Then for all "leads" nodes of Set B, I extract the embeddings of all the attributes a particular lead is connected to, average and do L2 normalization. Then with the XGBoost model trained on Set A "leads" embeddings, I predict on "leads" embeddings of Set B. The AUC drops to 0.65
Any reason why there is a drop in the AUC using Method 2 which was suggested to calculate embeddings for incoming nodes on the fly ? The alternative is method 1 where I have to retrain the graph by including new nodes every time.
Thanks
The text was updated successfully, but these errors were encountered: