MNIST-K-Means-Clustering

Using K-Means Clustering to Identify Handwritten Digits

Uncompress the .tar.gz archive to get the digits.base64.json dataset, which you'll need. (tar -xzvf digits.base64.json.tar.gz)

Design decision: the clustering algorithm is designed to train on labelled data. However, I've written it in such a way that it's easy to change to unlabelled data -- I considered making it modular for labelled/unlabelled data, but the more I think about it, the less I'm convinced of the utility of having a k-means clustering algorithm for unlabelled training data. (If your data is unlabelled, you can just place a dummy label on every datapoint.)

Inspired by a homework assignment in John Lafferty's Large-Scale Data Analysis course that I took at UChicago in the Spring of 2015. I collaborated with Elliott Ding on that assignment. In the class, we used distributed systems via AWS and Apache Spark, parallellized code, and did most analysis using map-reduce. To make the computational statistics more accessible, I've rewritten this notebook to not use distributed techniques.

See my blog post on this project here.

Name		Name	Last commit message	Last commit date
Latest commit History 18 Commits
.gitignore		.gitignore
Kmeans.ipynb		Kmeans.ipynb
README.md		README.md
digits.base64.json.tar.gz		digits.base64.json.tar.gz

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

.gitignore

.gitignore

Kmeans.ipynb

Kmeans.ipynb

README.md

README.md

digits.base64.json.tar.gz

digits.base64.json.tar.gz

Repository files navigation

MNIST-K-Means-Clustering

About

Releases

Packages

Languages

Datamine/MNIST-K-Means-Clustering

Folders and files

Latest commit

History

Repository files navigation

MNIST-K-Means-Clustering

About

Resources

Stars

Watchers

Forks

Languages