Skip to content

aleceress/mbti_sentiment

Repository files navigation

MBTI Sentiment Analysis

This repository contains a Sentiment Analysis over different Myers-Briggs Personality subreddits.

Data

For each personality, data were scraped from the posts of the corresponding subreddit (e.g. r/infj for the INFJ personality type), using this script.

Models

The following models were applied.

Analysis

The notebooks type_analysis.ipynb and aggregate_anaysis.ipynb contain visualizations of the performed analysis.

Type analysis

type_analysis.ipynb shows:

  • POSITIVE/NEGATIVE percentage and average emotion associated with each personality subreddit, along with their comparison.
  • A frequency wordcloud for each subreddit.

Aggregate analysis

aggregate_analysis.ipynb includes:

  • A study on whether there's a dependence between Myers-Briggs traits (Extraversion/Introversion, Sensing/Intuition, Thinking/ Feeling and Judging/Perceiving) and the sentiment/emotion scores of subreddit posts. This investigation was performed computing Chi-Squared and Odds Ratio and conveyed by visualizations.
  • Visualizations of the previous quest, but considering Dominant Cognitive Fuctions.
  • A clustering of personalities based on POSITIVE/NEGATIVE percentage and average emotions of their subreddit posts.

Running

To run all the code in the respository, you can create a virtual environment and run the following commands.

virtualenv venv 
source ./venv/bin/activate
pip install -r requirements.txt

To execute subreddit_post_scraper.py, you first need an instance of a MySQL database to connect to. You also need some parameters associated to your reddit account and to the MySQL database: all needs to be inserted in a config.py file, following the schema of config.example.py.