Skip to content

Handwritten Digit Classification on MNIST Dataset, Utilising Only Traditional Machine Learning Techniques and a Custom Feature Extractor

Notifications You must be signed in to change notification settings

rishz09/prml-course-project

Repository files navigation

Handwritten Digit Classification (with Custom Feature Extractor)

Classification on MNIST Dataset, utilising only traditional Machine Learning techniques. This is a course project of the course Pattern Recognition And Machine Learning of IIT Jodhpur, taught in Semester-II of Academic Year 2023-24.

Prerequisites

  • Downloading the MNIST Dataset is necessary for execution of the files.
  • The dataset is available in MNIST Dataset
  • Download the .CSV files of the training and test set.
  • Create a folder MNIST_CSV on the main branch, and inside the folder, store the two datasets as MNIST_train.csv and MNIST_test.csv
  • Instruction regarding saving files and images

    Keep pickle.dump() and plt.savefig() statements uncommented if you wish to save the trained classifiers or images.

    Order of execution of files

    1. experiment_with_classifiers - Trains different classifiers on different variations of the MNIST dataset.
    2. augmentation_code - Generates variations of the training dataset to have more training examples for the best models. File creates and saves a .feather file.
    3. save_custom_transformed_data - Creates and saves two .feather files, which are used for training and testing the best models.
    4. best_models - Trains the best models on the augmented dataset, and saves the classifiers are .pkl files for later use.
    5. prediction_real_img - Provides prediction of two handwritten digits, 3.jpg and 7.jpg, clicked on camera. Certain preprocessing steps are involved before prediction.

      The above files won't work properly if not executed in the above order.
      gen_augmentation_images is used to view the different variations of a single image, after data augmentation. Can be used after execution of file 2.
      failure_case_best_model is used to perform failure case analysis of the best model obtained after training. File is available for execution after executing file 4.

    Types of Datasets used for training

    • Original dataset (with normalisation)
    • Principal Component Analysis (dimensionality reduction)
    • Linear Discriminant Analysis (dimensionality reduction)
    • Edge Detector with Prewitt Kernel (feature extractor)
    • Custom feature extractor
    • Augmented Dataset for larger training set

    Classifiers used

    • K-Nearest Neighbors
    • Decision Trees
    • Linear Regression
    • Naive Bayes (Gaussian and Multinomial)
    • Random Forest
    • AdaBoost
    • Histogram Gradient Boosting Classifier
    • Support Vector Machines with Radial Basis Function Kernel

    Maximum Accuracy Achieved

    98.08%

    Report

    View the report to get an in-depth understanding of the project.

    Slides

    A concise presentation about the project.

    Interface

    • App folder contains the code to the app hosted on Hugging Face.
    • Link to interface.

    Authors

    • Ankit Kumar (B22CS076)
    • Rishabh Acharya (B22CS090)
    • Pujit Jha (B22CS091)
    • Raj Nandan Singh (B22EE052)
    • Ayush Pekamwar (B22EE084)


    Group No: 32

    About

    Handwritten Digit Classification on MNIST Dataset, Utilising Only Traditional Machine Learning Techniques and a Custom Feature Extractor

    Topics

    Resources

    Stars

    Watchers

    Forks

    Releases

    No releases published

    Packages

    No packages published

    Languages