word-embedding-skip-gram

skip-gram word embedding model by C++

in header nlp.h

#include "word_lib.h"
#include "bp_network.h"

in header word_lib.h

class wordlib
{
	private:
		std::list<std::string> dictionary;
	public:
		wordlib();
		~wordlib();
		int get_place(std::string);
		int lib_size();
		bool search(std::string);
		void add_word(std::string);
		void add_word_from_file(std::string);
		void print_lib();
		void print_word(int);
		std::string get_word(int);
};

this class is used to record all words that exist in training set.

in header bp_network.h

struct neuron;
double tanh(double x);
double difftanh(double x);
double sigmoid(double x);
double diffsigmoid(double x);
class word2vec;

word2vec will help you calculate each word's embedding vector and output information on the screen and into the file .

default input file's name is "trainingset.txt" ,you can change it if you like in word_embedding.cpp .

the most important thing

If you want to change the length of vectors,please go to bp_network.h and find this function: void word2vec::initializing()

then change the HNUM at the beginning, this parameter decides the length of vectors.

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
src		src
testfile		testfile
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

src

src

testfile

testfile

README.md

README.md

Repository files navigation

word-embedding-skip-gram

the most important thing

About

Releases

Packages

Languages

ValKmjolnir/word-embedding-skip-gram

Folders and files

Latest commit

History

Repository files navigation

word-embedding-skip-gram

the most important thing

About

Topics

Resources

Stars

Watchers

Forks

Languages