Embeddings

Training Word Embeddings and using them to perform Sentiment Analysis

About

Word Embeddings are vector representations of words of a language, and are trained over large collection of raw text. Here, we have trained word embeddings using the approach specified by the Neural Probabalistic Language Model. We use a large corpus of english text, drawn from a collection of novels from the 1880s, and use it to create a large number of context-target pair examples. The Embeddings are 64 dimensional, and are built over a vocabulary of the 25000 most frequent words of the corpus.

The trained embeddings are evaluated by a set of simple analogical reasoning tasks, and then mapped to 2-D using the t-SNE algorithm for visualization. Further, the trained embeddings are then used to perform Sentiment Analysis on the IMDB dataset of reviews. An attention based LSTM model, with a maximum input sequence length of 300 is used for the task of Sentiment Analysis. Its architecture is specified below

Demo

Here is a 2D plot of the embeddings of some words. (The high dimensional embeddings are converted to 2-D using the t-SNE algorithm)

As you can see, words conveying emotions such "sad", "melancholy", "happy" and "merry" are grouped together. Number words such as "two", "three", "four" and "five" are also grouped together. Words with similar meanings such as "under" and "beneath", and "run", "jump" and "leap" are also grouped together. Animal such as "cat", "dog", and "horse" are grouped together. Words carrying temporal meanings such as "month", "year" and "hour" are also grouped together. Further, words such "man", "woman", "king" and "queen" are grouped together as well.

Name		Name	Last commit message	Last commit date
Latest commit History 10 Commits
Dataset		Dataset
Embedding		Embedding
Sentiment Analysis		Sentiment Analysis
assets		assets
.gitignore		.gitignore
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Dataset

Dataset

Embedding

Embedding

Sentiment Analysis

Sentiment Analysis

assets

assets

.gitignore

.gitignore

README.md

README.md

Repository files navigation

Embeddings

About

Demo

About

Languages

vpuru98/Embeddings

Folders and files

Latest commit

History

Repository files navigation

Embeddings

About

Demo

About

Topics

Resources

Stars

Watchers

Forks

Languages