Skip to content

Word2Vec sikp-gram model with negative sampling implementation with python3

Notifications You must be signed in to change notification settings

Huixxi/NLP_Word2Vec

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

21 Commits
 
 
 
 

Repository files navigation

NLP_Word2Vec:Skip-Gram Model

I implement a classic word2vec model: skip-gram model with negative sampling as the optimization method by hand in pure python3 and used TED-Talks-Dataset as the train set. To test the performence of the final embedding vectors, I used the TOEFL Synonym Questions Dataset to test its accuracy.

GOAL:

Building a skip-gram model with negative sampling to achieve that:
Given a specific word in the middle of a sentence (the input word), look at the words nearby and pick one at random. The network is going to tell us the probability for every word in our vocabulary of being the “nearby word” that we chose.

REFERANCE:

Blog:
01.Word2Vec Tutorial - The Skip-Gram Model
02.Word2Vec Tutorial - Negative Sampling
03.Deep Learning实战之word2vec
04.Word2Vec and FastText Word Embedding with Gensim
05.A Gentle Introduction to the Bag-of-Words Model
06.Python implementation of Word2Vec

Paper:
01.Distributed Representations of Words and Phrases and their Compositionality
02.Efficient Estimation of Word Representations in Vector Space
03.Word2vec Parameter Learning Explained
04.Linguistic Regularities in Continuous Space Word Representations
05.Evaluation methods for unsupervised word embeddings
06.Word and Phrase Translation with word2vec
07.word2vec Explained: Deriving Mikolov et al.’s Negative-Sampling Word-Embedding Method
08.How to Generate a Good Word Embedding?

Video:
01.Negative Sampling-Coursera Deeplearning

Code:
01.word2vec_commented_in_C
02.word2vec code in python

DATASET:

01.TED-Talks-Dataset
02.TOEFL Synonym Questions
Other datasets:
WordSim353SNLI、[NER]、[SQuAD]、[Coref]、[SRL]、[SST-5]、[Parsing]

About

Word2Vec sikp-gram model with negative sampling implementation with python3

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published