Skip to content

Quick notebook to refer to different ways to handle imbalanced datasets.

License

Notifications You must be signed in to change notification settings

SahilChachra/Handling-Imbalanced-Dataset

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

12 Commits
 
 
 
 
 
 
 
 

Repository files navigation

Handling Imbalanced Dataset

This project shows how we can handle imbalanced dataset using various methods!


                              

😇 Motivation

While learning Machine Leanring, I came across few datasets which were highly imbalanced which resulted in me getting stuck in the very beginning. So I thought of making a notebook which will help in quickly refering and revising different ways to handle imbalanced datasets.

⭐ Features

  1. Under-sampling
  2. Over-sampling
  3. imbalanced-learn module
  4. Random Over-sampling and under-sampling
  5. Tomek links
  6. SMOTE
  7. Over-sampling followed by under-sampling
  • Using Recall to measure accuracy
  • Performed Logistic Regression for all the preprocessed data
  • Used Recall Score as metric to measure how well the model is performing!

📁 Dataset

The dataset used can be downloaded here (Kaggle) - Click to Download

❤️ Owner

Made with ❤️  by Sahil Chachra

👀 License

MIT © Sahil Chachra

About

Quick notebook to refer to different ways to handle imbalanced datasets.

Topics

Resources

License

Stars

Watchers

Forks