Twitter Sentiment Analysis With Word2Vec

A Twitter sentiment analysis (TSA) project to show the effect of different training sizes on word2vec models.

This repository is based on the following two experiments:

Comparison between different sizes (number of tweets trained) of a self-trained word2vec model: The word2vec models are trained on a predefined number of tweets and then compared to each other.
Comparison between a specific word2vec model trained on the tweets themselves and a pre-trained word2vec model (word2vec-google-news-300): The domain-dependent (marked as specific in the filename) word2vec model trained on the largest number of tweets (from Experiment 1) is compared to the general (marked as nonspecific in the filename) pre-trained word2vec model.

Data

The data is taken from https://github.com/prateekjoshi565/twitter_sentiment_analysis and is originally from a competition about TSA (https://datahack.analyticsvidhya.com/contest/practice-problem-twitter-sentiment-analysis/).

However, you are not limited to using these datasets. Feel free to use your own datasets!

Getting Started

Installing

Use

pip -r requirements.txt

If this does not work, you will need the following dependencies:

Python 3.8 (you can use Simple Python version management (pyenv) to easily manage different versions)
Python Development Workflow for Humans (Pipenv)

and use

pyenv install 3.8.10
pyenv local 3.8.10
pipenv install

Executing Program

python code/run.py (feel free to comment out parts before running).

Program structure:

The program will perform the following operations (in yellow or blue) on the data (in grey):

It will also generate visualisations to summarise the results. The generation of all models and visualisations will take about 5 min (depending on your hardware). Afterwards you will see the following additional file structure:

./
├── data
│   └── cleaned
│       ├── data.csv
│       ├── test.csv
│       ├── train.csv
│       └── validation.csv
├── models
│   └── word2vec
│       ├── 100_tweets_w2v.model
│       ├── 1000_tweets_w2v.model
│       ├── 10000_tweets_w2v.model
│       ├── 20000_tweets_w2v.model
│       └── 26971_tweets_w2v.model
└── plots
    ├── data
    │   ├── 1-class_distribution_pie_chart.pdf
    │   ├── 2-word_freq-pos_tweets.pdf
    │   ├── 3-word_freq-neg_tweets.pdf
    │   ├── 4-word_freq_with_tweets-pos_tweets.pdf
    │   ├── 5-word_freq_with_tweets-neg_tweets.pdf
    │   ├── 6-word_freq_with_words-pos_tweets.pdf
    │   └── 7-word_freq_with_words-neg_tweets.pdf
    ├── test
    │   ├── 1-class_distribution_pie_chart.pdf
    │   ├── 2-word_freq-pos_tweets.pdf
    │   ├── 3-word_freq-neg_tweets.pdf
    │   ├── 4-word_freq_with_tweets-pos_tweets.pdf
    │   ├── 5-word_freq_with_tweets-neg_tweets.pdf
    │   ├── 6-word_freq_with_words-pos_tweets.pdf
    │   ├── 7-word_freq_with_words-neg_tweets.pdf
    │   ├── 8-evaluation_f1-specific_w2v_models.pdf
    │   ├── 9-evaluation_mcc-specific_w2v_models.pdf
    │   ├── 10-evaluation_f1-unspecific_w2v_models.pdf
    │   └── 11-evaluation_mcc-unspecific_w2v_models.pdf
    ├── train
    │   ├── 1-class_distribution_pie_chart.pdf
    │   ├── 2-word_freq-pos_tweets.pdf
    │   ├── 3-word_freq-neg_tweets.pdf
    │   ├── 4-word_freq_with_tweets-pos_tweets.pdf
    │   ├── 5-word_freq_with_tweets-neg_tweets.pdf
    │   ├── 6-word_freq_with_words-pos_tweets.pdf
    │   └── 7-word_freq_with_words-neg_tweets.pdf
    └── validation
        ├── 1-class_distribution_pie_chart.pdf
        ├── 2-word_freq-pos_tweets.pdf
        ├── 3-word_freq-neg_tweets.pdf
        ├── 4-word_freq_with_tweets-pos_tweets.pdf
        ├── 5-word_freq_with_tweets-neg_tweets.pdf
        ├── 6-word_freq_with_words-pos_tweets.pdf
        ├── 7-word_freq_with_words-neg_tweets.pdf
        ├── 8-evaluation_f1-specific_w2v_models.pdf
        ├── 9-evaluation_mcc-specific_w2v_models.pdf
        ├── 10-evaluation_f1-unspecific_w2v_models.pdf
        └── 11-evaluation_mcc-unspecific_w2v_models.pdf

Name		Name	Last commit message	Last commit date
Latest commit History 67 Commits
code		code
data		data
.gitignore		.gitignore
Pipfile		Pipfile
Pipfile.lock		Pipfile.lock
README.md		README.md
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

code

code

data

data

.gitignore

.gitignore

Pipfile

Pipfile

Pipfile.lock

Pipfile.lock

README.md

README.md

requirements.txt

requirements.txt

Repository files navigation

Twitter Sentiment Analysis With Word2Vec

Data

Getting Started

Installing

Executing Program

About

Releases

Packages

Languages

creativeDev6/twitter_sentiment_analysis_with_word2vec

Folders and files

Latest commit

History

Repository files navigation

Twitter Sentiment Analysis With Word2Vec

Data

Getting Started

Installing

Executing Program

About

Topics

Resources

Stars

Watchers

Forks

Languages