Why is kmeans using flat index faster than scikit learn #2589
Answered
by
alexanderguzhva
sstone-codaio
asked this question in
Q&A
Replies: 1 comment
-
|
Beta Was this translation helpful? Give feedback.
0 replies
Answer selected by
alexanderguzhva
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
Hi, I benchmarked faiss (on CPU) and scikit learn for k means. The analysis is run on 1M vectors of size 33, assigning to 256 clusters, and faiss significantly outperforms scikit learn (>10x faster). I was not really expecting that, because I used the Flat index for centroids, and based on documentation "Flat indexes just encode the vectors into codes of a fixed size ... At search time, all the indexed vectors are decoded sequentially and compared to the query vectors.", it doesn't seem any performance gain should be expected. Is there any theoretical foundation as to why faiss runs a lot faster?
Happy to provide more details if need.
Beta Was this translation helpful? Give feedback.
All reactions