flakylib

Flaky Clustering Library (Minimum Sum-Of-Squares Clustering)

Clustering is one of the main methods for getting insight on the underlying nature and structure of data in unsupervised way. The purpose of clustering is organizing a set of data into clusters, such that the elements in each cluster are similar and different from those in other clusters.

K-means is one of the most used and fastest clustering algorithms. Actually, in nowadays K-means represents a big family of algorithms aimed to solving Minimum Sum-of-Square Clustering problem. FlakyLib is a Python library of K-means algorithm family optimized for Big Data clustering. Most of the algorithms in the FlakyLib are parallelized and optimized for high-performance computing with Numba, which translates Python functions to optimized machine code at runtime using the industry-standard LLVM compiler library. All FlakyLib algorithms are aimed to solving Minimum Sum-Of-Squares Clustering problem.

In solving large size problems, there are two major drawbacks of standard K-means technique: (i) since it has to process the Large/Big input dataset, it has heavy computational costs and (ii) it has a tendency to converge to one of the local minima of poor quality. In order to reduce the computational complexity, we collect a clustering techniques that utilize parallelization, memory and computational optimizations. To avoid the local minima convergence problem the different Variable Neighbourhood Search (VNS) techniques was used. Using FlakyLib it is possible to clusterize a tens/hundreds of millions of entities in a reasonable amount of time with efficient memory usage.

In most of the cases the naive K-means algorithm is not provide the best possible solution getting stuck in the nearest pit of local minimum. But using different VNS-based approaches (metaheuristics) it is possible to force K-means moving forward and on every iteration increasing solution quality. VNS allow us to transform the local search algorithms (like naive K-means) to global ones.

Name		Name	Last commit message	Last commit date
Latest commit History 18 Commits
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
flakylib.py		flakylib.py
flakylib_demo.ipynb		flakylib_demo.ipynb
number_of_clusters_demo.ipynb		number_of_clusters_demo.ipynb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

.gitignore

.gitignore

LICENSE

LICENSE

README.md

README.md

flakylib.py

flakylib.py

flakylib_demo.ipynb

flakylib_demo.ipynb

number_of_clusters_demo.ipynb

number_of_clusters_demo.ipynb

Repository files navigation

flakylib

About

Releases

Packages

Languages

License

R-Mussabayev/flakylib

Folders and files

Latest commit

History

Repository files navigation

flakylib

About

Topics

Resources

License

Stars

Watchers

Forks

Languages