Why is kmeans using flat index faster than scikit learn #2589

sstone-codaio · 2022-11-22T18:51:25Z

sstone-codaio
Nov 22, 2022

Hi, I benchmarked faiss (on CPU) and scikit learn for k means. The analysis is run on 1M vectors of size 33, assigning to 256 clusters, and faiss significantly outperforms scikit learn (>10x faster). I was not really expecting that, because I used the Flat index for centroids, and based on documentation "Flat indexes just encode the vectors into codes of a fixed size ... At search time, all the indexed vectors are decoded sequentially and compared to the query vectors.", it doesn't seem any performance gain should be expected. Is there any theoretical foundation as to why faiss runs a lot faster?

Happy to provide more details if need.

Answered by alexanderguzhva

Dec 4, 2022

It depends on the number of iterations that k-means uses. As far as I remember, Faiss uses 25 iterations by default.
By default, Faiss uses only a subset of samples for the clustering procedure, and it is 256 * nclusters, if I recall correctly. You can find the number that Faiss uses by turning on the verbose output.
Faiss uses Intel MKL and some optimized SIMD kernels.

View full answer

alexanderguzhva · 2022-12-04T21:17:25Z

alexanderguzhva
Dec 4, 2022

It depends on the number of iterations that k-means uses. As far as I remember, Faiss uses 25 iterations by default.
By default, Faiss uses only a subset of samples for the clustering procedure, and it is 256 * nclusters, if I recall correctly. You can find the number that Faiss uses by turning on the verbose output.
Faiss uses Intel MKL and some optimized SIMD kernels.

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Why is kmeans using flat index faster than scikit learn #2589

{{title}}

Replies: 1 comment

{{title}}

Select a reply

Why is kmeans using flat index faster than scikit learn #2589

sstone-codaio Nov 22, 2022

Replies: 1 comment

alexanderguzhva Dec 4, 2022

sstone-codaio
Nov 22, 2022

alexanderguzhva
Dec 4, 2022