Skip to content

williamchanrico/follower-gender-classification

Repository files navigation

Follower Gender Classifier

🏡
Know the gender of your followers!
A quick hack to support my short text classification thesis. This code serves as a quick demo and is NOT maintained!

Introduction

screenshot1

Goal

The goal of this code demo is to predict gender of social media users based on comments section on Instagram profile by using AdaBoost, XGBoost, Support Vector Machine, and Naive Bayes Classifier combined with a grid search and K- Fold validation.

How many are males vs females?

We collect comments against followers' Instagram picture media posts and format them as bag-of-words along with other pre-processing.

Data Labelling

To label the data that was used to train the model, we use Azure FaceAPI to filter pictures where there's only one person and is able to detect their gender.

Overview

The code demo consists of several parts:

  • The frontend built with socketio, flask, html & javascript,
  • and 4 different implementations of classifier algorithm: xgboost, support vector machine, naive bayes, and adaboost.

adab/

Implementing AdaBoost using sklearn library.

app.py

Main entrypoint for the Flask application (this project).

data/

Data dump(s) or saved pickle files (cache, models, etc.).

screenshots/

Screenshots.

naive_bayes/

Implementing naive bayes algorithm using nltk library.

svm

Implementing Support Vector Machine algorithm using sklearn library.

thirdparty/

Third-party related library supporting this project.

xgb

Implementing eXtreme gradient boosting algorithm using xgboost library.

Getting Started

The config is simply loaded by app.py via decouple package, keeping things simple.

Run cp .env.example .env and fill the necessary variables.

Docker

docker run --env-file=.env -p 9000:9000 williamchanrico/follower-gender-classification:v0.1.0

Manual

Assuming you have virtualenv and python3-pip installed:

  • virtualenv venv && source venv/bin/activate
  • pip3 install -r requirements.txt
  • python -m nltk.downloader punkt
  • Optionally, you may need libgomp1 depending on your operating system (required by xgboost)
python3 app.py

Python Version

$ python --version
Python 3.7.1

License

GNU General Public License v3.0