K-Means Clustering

A repository documenting the implementation of k-Means clustering in Python. Usage examples can be found in the tests directory.

The thing that makes this k-means clustering module different from others is that it allows the user to specify the number of dimensions to use for the clustering operation.

For example, given some data where each element is of form

# Each element would actually be a Numpy array, but the following uses lists for readability.
[
  [1, 2, 3, 4, 5],
  [4, 6, 7, 8, 2],
  ...
]

specifying ndim=3 will result in only the first three elements of each data point being used for each operation.

This is useful for maintaining data association where it otherwise would be shuffled. An example of this is found in my implementation of image segmentation (segmentation.py) in this same project. Other examples of use could be for maintaining data association in object detection elements. Given some

[xmin, ymin, xmax, ymax, conf, label]  # [bounding box, conf, label]

we may want to cluster the data solely on bounding box information while also maintaining the confidence intervals for each detection for further processing.

Installation

$ python -m pip install kmeans-tjdwill

How it Works

Specifying the k value results in a dict[int: NDArray] where each NDArray contains the elements within the cluster. The keys of this dict range from 0 to k-1, allowing the key to also be used to index the corresponding cluster centroid from the centroid array.

Here is an example of the use of the cluster function:

>>> from kmeans import cluster
>>> import numpy as np
>>> np.random.seed(27)   # For reproducible results
>>> data = np.random.random((15, 5)).round(3)
>>> data[0]
array([0.426, 0.815, 0.735, 0.868, 0.383])
>>> # Cluster using only first two dimensions
>>> clusters, centroids = cluster(data, k=3, ndim=2, tolerance=0.001)
>>> centroids
array([[0.9004  , 0.79    ],
      [0.361375, 0.580125],
      [0.801   , 0.143   ]])
>>> clusters  # visually compare centroids with first two elements of each data entry.
{0: array([[0.979, 0.893, 0.21 , 0.742, 0.663],
     [0.887, 0.858, 0.749, 0.87 , 0.187],
     [0.966, 0.583, 0.092, 0.014, 0.837],
     [0.915, 0.705, 0.387, 0.706, 0.923],
     [0.755, 0.911, 0.242, 0.976, 0.304]]),
1: array([[0.426, 0.815, 0.735, 0.868, 0.383],
     [0.326, 0.373, 0.794, 0.151, 0.17 ],
     [0.081, 0.305, 0.783, 0.163, 0.071],
     [0.221, 0.726, 0.849, 0.929, 0.736],
     [0.477, 0.493, 0.595, 0.076, 0.117],
     [0.288, 0.684, 0.52 , 0.877, 0.924],
     [0.489, 0.596, 0.264, 0.992, 0.21 ],
     [0.583, 0.649, 0.911, 0.122, 0.676]]),
2: array([[0.701, 0.181, 0.599, 0.415, 0.514],
     [0.901, 0.105, 0.673, 0.87 , 0.561]])}

Features

k-means clustering (no side-effects)
k-means clustering w/ animation
- (2-D & 3-D)
image segmentation via kmeans.segmentation.segment_img function

k-means Animation

Using the view_clustering function

2-D Case (Smallest Tolerance Possible)

kmeans2D_animate.webm

3-D Case (Tolerance = 0.001)

kmeans3D_animate.webm

Image Segmentation

Perform image segmentation based on color groups specified by the user.

Two options:

Averaged Colors

k=4

k=10

Random Colors

k=4

Developed With

Python (3.12.1)
Numpy (1.26.2)
Matplotlib (3.8.4)

However, no features specific to Python 3.12 were used.

Name		Name	Last commit message	Last commit date
Latest commit History 115 Commits
.github/workflows		.github/workflows
docs		docs
images		images
src/kmeans		src/kmeans
tests		tests
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
pyproject.toml		pyproject.toml
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

.github/workflows

.github/workflows

docs

docs

images

images

src/kmeans

src/kmeans

tests

tests

.gitignore

.gitignore

LICENSE

LICENSE

README.md

README.md

pyproject.toml

pyproject.toml

requirements.txt

requirements.txt

Repository files navigation

K-Means Clustering

Installation

How it Works

Features

k-means Animation

2-D Case (Smallest Tolerance Possible)

3-D Case (Tolerance = 0.001)

Image Segmentation

Averaged Colors

Random Colors

Developed With

About

Releases 1

Packages

Languages

License

tjdwill/kmeans

Folders and files

Latest commit

History

Repository files navigation

K-Means Clustering

Installation

How it Works

Features

k-means Animation

2-D Case (Smallest Tolerance Possible)

3-D Case (Tolerance = 0.001)

Image Segmentation

Averaged Colors

Random Colors

Developed With

About

Topics

Resources

License

Stars

Watchers

Forks

Languages