Skip to content

gdp is generating distributed representation code sets written by pytorch. This code sets is including skip gram and cbow.

License

Notifications You must be signed in to change notification settings

RottenFruits/gdp

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

45 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

gdp

gdp is generating distributed representation code sets written by pytorch.

This code sets is including skip gram and cbow.


Installation

Dependencies

gdp requires:

  • python 3.6.6
  • pytorch-cpu 1.0.0
  • numpy 1.15.4
  • tqdm 4.28.1

User installation

You can install gdp running the following commands.

pip install git+https://github.com/RottenFruits/gdp

Example

skip gram

This is example that run simple skip gram.

from gdp import distributed_representation as dr
from gdp import corpus as cp

data = [
    'he is a king',
    'she is a queen',
    'he is a man',
    'she is a woman',
    'warsaw is poland capital',
    'berlin is germany capital',
    'paris is france capital',
]

corpus = cp.Corpus(data = data, mode = "a", max_vocabulary_size = 5000, max_line = 0, 
                   minimum_freq = 0)

window_size = 1
embedding_dims = 30
batch_size = 128

dr_sg = dr.DistributedRepresentation(corpus, embedding_dims, window_size, batch_size, 
                                       model_type = "skip-gram", ns = 0, trace = True)
dr_sg.train(num_epochs = 101, learning_rate = 0.05)

skip gram with negative sampling

If you want to use negative sampling is this.

dr_sgns = dr.DistributedRepresentation(corpus, embedding_dims, window_size, batch_size, 
                                       model_type = "skip-gram", ns = 1, negative_samples = 5, trace = True)
dr_sgns.train(num_epochs = 101, learning_rate = 0.05)

etc

If you want to use cbow architecture, you should replace model_type "skip-gram" to "cbow".

And more example code is in example directory, please check it too.

Distributed representations

gdp inclues:

  • skipgram
  • skipgram with negative sampling
  • cbow
  • cbow with negative sampling

Reference

About

gdp is generating distributed representation code sets written by pytorch. This code sets is including skip gram and cbow.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages