Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Skip Gram with Negative Sampling #193

Open
andland opened this issue Jun 14, 2017 · 4 comments
Open

Skip Gram with Negative Sampling #193

andland opened this issue Jun 14, 2017 · 4 comments

Comments

@andland
Copy link

andland commented Jun 14, 2017

https://arxiv.org/pdf/1705.09755v1.pdf

I recently posted a paper to arXiv showing that word2vec's Skip Gram with Negative Sampling (SGNS) algorithm is a weighted logistic PCA. With that framework, SGNS can be trained using the same term-context matrix that is used for GloVe. The training could use the same AdaGrad procedure, only with different gradients and loss function and sampling all of the elements of the matrix instead of just the non-zeroes.

Is SGNS something you are interested in including in the text2vec package, or are you happy with GloVe?

Thanks

@dselivanov
Copy link
Owner

Thanks! Article looks very interesting. From my experience sgns and glove usually perform very similar. But would be interesting to compare in more detailed way.

@andland
Copy link
Author

andland commented Jun 14, 2017

I agree they are largely similar, but an advantage of SGNS is that it does better for rarely occurring words. As the Swivel paper puts it:
"GloVe is under-constrained: there is no penalty for placing unobserved but unrelated embeddings near to one another."

@dselivanov
Copy link
Owner

dselivanov commented Jun 15, 2017

Yes, I remember this. But the clear advantage of GloVe is that complexity is O(nnz) instead of O(D^2). As I understand proposed SGNS and SGNS-LS also suffer from having complexity O(D^2).

@andland
Copy link
Author

andland commented Jun 15, 2017

That is a downside. However, my intuition is that the number of parameter updates is more relevant than than number of epochs. For example, Figure 5 of the BPR paper. i.e. the algorithm may converge in a similar number of parameter updates as GloVe. This is mostly speculation though.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants