MNIST-Analysis

K-fold cross-validation implemented from scratch to aid in analysis of the MNIST dataset
It was determined that the left side of the numbers 3 and 8 are the important features used in differentiating between handwriting samples of the numbers 3 and 8. In the Hard2ClassifyData folders, we can see that samples with light handwriting are typically hard to classify as well as "scrunched" handwriting, where the lower part of 3 overextends to almost make it look like an 8. An example is below:

Results

Important pixels to differentiate between a 3 and an 8 as determined by Logistic Regression (brighter pixels means more important)

Important pixels to differentiate between a 3 and an 8 as determined by Linear SVM (brighter pixels means more important)

The folders Easy2ClassifyData[MLmodel] and Hard2ClassifyData[MLmodel] contain examples of handwriting that is easy for the ML model to classify and hard for the ML model to classify, respectively.

To run

Unzip the MNIST.zip file to obtain the MNIST dataset. Ensure the unzipped folder remains in the same folder as the file MNIST.zip.
To replicate the results, simply open a file and run.
- cv_builtin.py will fit a Logistic Regression and Linear SVM model to the data and print the tuned hyperparameters as determined by scikit-learn methods.
- cv_scratch.py will run the cross validation implemented from scratch. Feel free to change the number of folds K, which is defined as a global variable to experiment with the method.
- important_features.py will use the best lambda (can be changed in the code) as determined by cv_scratch.py to find the pixels in an image that are important in differentiating between a 3 and an 8.

Name		Name	Last commit message	Last commit date
Latest commit History 9 Commits
Easy2ClassifyDataLogReg		Easy2ClassifyDataLogReg
Easy2ClassifyDataSVM		Easy2ClassifyDataSVM
Hard2ClassifyDataLogReg		Hard2ClassifyDataLogReg
Hard2ClassifyDataSVM		Hard2ClassifyDataSVM
LogImportantFeatures.png		LogImportantFeatures.png
MNIST.zip		MNIST.zip
README.md		README.md
SVMImportantFeatures.png		SVMImportantFeatures.png
cv_builtin.py		cv_builtin.py
cv_scratch.py		cv_scratch.py
important_features.py		important_features.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Easy2ClassifyDataLogReg

Easy2ClassifyDataLogReg

Easy2ClassifyDataSVM

Easy2ClassifyDataSVM

Hard2ClassifyDataLogReg

Hard2ClassifyDataLogReg

Hard2ClassifyDataSVM

Hard2ClassifyDataSVM

LogImportantFeatures.png

LogImportantFeatures.png

MNIST.zip

MNIST.zip

README.md

README.md

SVMImportantFeatures.png

SVMImportantFeatures.png

cv_builtin.py

cv_builtin.py

cv_scratch.py

cv_scratch.py

important_features.py

important_features.py

Repository files navigation

MNIST-Analysis

Results

To run

About

Releases

Packages

Languages

hhuang5163/MNIST-Analysis

Folders and files

Latest commit

History

Repository files navigation

MNIST-Analysis

Results

To run

About

Topics

Resources

Stars

Watchers

Forks

Languages