Skip to content

hassyGo/paragraph-vector

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

40 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

paragraph-vector

paragraph vector trained by negative sampling
This project requires a template library for linear algebra: Eigen (http://eigen.tuxfamily.org/index.php?title=Main_Page)

An online demo is available at: http://www.logos.t.u-tokyo.ac.jp/~hassy/implementations/paragraph_vector/

ToDo

  • speedup the code
  • make it possible to train unknown paragraphs, such as paragprahs in test data

USAGE

  1. modify the line in Makefile to use Eigen
    EIGEN_LOCATION=$$HOME/local/eigen #Change this line

  2. run the command "make" or run the script "sample.sh"

  3. train a model using your corpus which should have a paragraph (or document, sentence) in each line
    ./paragraph_vector -input input.txt -output result
    (run "./paragraph_vector -help" or see Utils.hpp for other options)

  4. use the resulting files for your purpose
    result.bin
    result.pv: each line has a paragraph ID and real values of its vector representation
    result.wv: each line has a word and real values of its vector representation

Reference

Quoc Le, Tomas Mikolov. Distributed Representations of Sentences and Documents. 2014. Proceedings of the 31st International Conference on Machine Learning (ICML-14), pages 1188--1196.

About

paragraph vector trained by negative sampling

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published