Skip to content
This repository has been archived by the owner on Dec 5, 2022. It is now read-only.

[Completed] Complete framework on multi-class classification covering EDA using x-charts and Principle Component Analysis; machine learning algorithms using LGBM, RF, Logistic Regression and Support Vector Algorithms; as well as Bayesian Optimizer with l1 and l2 regularization for Hyperparameter Tuning.

License

gracengu/multiclass_classification

Repository files navigation

Multi-class Classification

This repo comprise of the codes developed for multi-class classification. The goal of this problem is to use the input variables to correctly classify or predict the target variable which is a multiclass categorical variable. There are a total of 150 input variables in the data.

This repo is uploaded in github to demonstrate the use of agile and git throughout the exercise.

To see the final report: You may download 'Final_Report.html'

Table of Contents

  • 1. About the Project
  • 2. Getting Started
  • 3. Set up your environment
  • 4. Open your Jupyter notebook

Structuring a repository

An integral part of having reusable code is having a sensible repository structure. That is, which files do we have and how do we organise them.

  • Folder layout:
multiclass_classification
├── docs
├── data
├── models
│   └── models_new
├── results
├── src
│   └── analysis
│       └── __init__.py
│       └── analysis.py
|   └── train
│       └── __init__.py
│       └── train.py
|   └── Config.py
├── .gitignore
├── README.md
└── requirements.txt

1. About the Project

The following is the summary of data scienc/ml/deep learning approaches used in this project:

  • Data import and Pre-EDA
  • Data cleaning/ Missing data imputation
  • EDA: Outlier analysis
  • EDA: Statistical/Distribution analysis
  • EDA: Feature Engineering and Selection
  • Modelling - Baseline Model
  • Modelling - Hyperparameter Tuning and Regularization
  • Modelling - Regularization
  • Model Evaluation: Selection of Metrics

2. Getting Started - Clone the repository locally

You may git bash at any preferred folder location and run the following command:

git clone https://github.com/gracengu/multiclass_classification.git

Alternatively, although not recommended, you can download the zip file of the repository at the top of the
main page of the repository. If you prefer not to use git or don't have experience with it, this a good option.

3. Set up your environment

Note: the following instructions are specifically for pip users only.

The best practice is to create an isolated environment to avoid dependency conflicts in python. If this is the first
time you're setting up your compute environment, please first install pip and virtualenv.

python get-pip.py
pip install virtualenv

After installation, set up the virtual environment. If you have multiple python version installed in your PC, you may want to specify the python version you're using.

virtualenv --python=<path of python.exe with a specific version> python3.7_multiclass

To activate your environment, run the activate script from the virtualenv you have created:

python3.7_multiclass\Scripts\activate

Change the path for tensorflow installation:

tensorflow-cpu @ file:///C:/Projects/2021/multiclass_classification/support/tensorflow_cpu-2.6.0-cp37-cp37m-win_amd64.whl

Please install all of the packages listed in the requirement.txt using the following command:

pip install -r requirement.txt

If there is error installing tensorflow, comment out tensorflow from requirements.txt and perform the following commands:

pip install .\package\tensorflow_cpu-2.6.0-cp37-cp37m-win_amd64.whl
pip install -r requirements.txt

4. Open your Jupyter notebook

  1. You will have to install a new IPython kernelspec to run the jupyter notebook in an isolated environment.
ipython kernel install --name python3.7_multiclass --user

You can change the --name to anything you want.

  1. In the terminal, execute jupyter notebook.

Navigate to the notebooks directory and open notebook:

  • Baseline Model: Baseline Model.ipynb
  • Hyperparameter Tuning: Hyperparameter Tuning.ipynb

4. Final Report

The final report is written in jupyter notebook `Final report.ipynb'. Run the notebook from start to end, to generate the html report.

About

[Completed] Complete framework on multi-class classification covering EDA using x-charts and Principle Component Analysis; machine learning algorithms using LGBM, RF, Logistic Regression and Support Vector Algorithms; as well as Bayesian Optimizer with l1 and l2 regularization for Hyperparameter Tuning.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published