KMeans algorithm and the Elbow criterion

"The idea behind k-Means Clustering is to take a bunch of data and determine if there are any natural clusters (groups of related objects) within the data.

The k-Means algorithm is a so-called unsupervised learning algorithm. We don't know in advance what patterns exist in the data -- it has no formal classification to it -- but we would like to see if we can divide the data into groups somehow.

For example, you can use k-Means to find what are the 3 most prominent colors in an image by telling it to group pixels into 3 clusters based on their color value. Or you can use it to group related news articles together, without deciding beforehand what categories to use. The algorithm will automatically figure out what the best groups are.

The "k" in k-Means is a number. The algorithm assumes that there are k centers within the data that the various data elements are scattered around. The data that is closest to these so-called centroids become classified or grouped together.

k-Means doesn't tell you what the classifier is for each particular data group. After dividing news articles into groups, it doesn't say that group 1 is about science, group 2 is about celebrities, group 3 is about the upcoming election, etc. You only know that related news stories are now together, but not necessarily what that relationship signifies. k-Means only assists in trying to find what clusters potentially exist."

-- taken from Swift Algorithm Club's explantation of the algorithm

Repository contains:

Code for fitting scikit-learn's K-Means model to the iris dataset.
Code for determining optimal number of clusters for K-means algorithm using the 'elbow criterion'.
IPython notebook combining the above two as an interactive tutorial.

Running the notebook:

Clone the repo. git clone https://github.com/analyticalmonk/KMeans_elbow
Change into the repo's directory. cd KMeans_elbow
Install the requirements. pip install -r requirements.txt
Start the notebook. jupyter notebook kmeans_elbow.ipynb

If you are new to Jupyter notebooks, check out the official Quick Start Guide.

Reference:

Name		Name	Last commit message	Last commit date
Latest commit History 33 Commits
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
kmeans_elbow.ipynb		kmeans_elbow.ipynb
kmeans_elbow.py		kmeans_elbow.py
kmeans_iris.py		kmeans_iris.py
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

.gitignore

.gitignore

LICENSE

LICENSE

README.md

README.md

kmeans_elbow.ipynb

kmeans_elbow.ipynb

kmeans_elbow.py

kmeans_elbow.py

kmeans_iris.py

kmeans_iris.py

requirements.txt

requirements.txt

Repository files navigation

KMeans algorithm and the Elbow criterion

Running the notebook:

About

Releases

Packages

Contributors 2

Languages

License

analyticalmonk/KMeans_elbow

Folders and files

Latest commit

History

Repository files navigation

KMeans algorithm and the Elbow criterion

Running the notebook:

About

Topics

Resources

License

Stars

Watchers

Forks

Languages