Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Lantern’s Performance vs. pgvector - Authenticity and Future Improvements #272

Open
Haskely opened this issue Jan 29, 2024 · 3 comments
Open

Comments

@Haskely
Copy link

Haskely commented Jan 29, 2024

Hi Lantern Team,

I noticed your commitment to performance excellence in the README. However, a comparison on tembo.io (https://tembo.io/blog/postgres-vector-search-pgvector-and-lantern) suggests Lantern falls behind pgvector in certain metrics.

image

Could you comment on:

  • The validity of these findings?
  • Any planned improvements for Lantern?

Looking forward to your insights.

@Ngalstyan4
Copy link
Contributor

Hi @Haskely,

Thanks your interest in lantern and for commenting on this!
You are right that the benchmarks in the Readme are out of date.
Tembo's findings seem reasonable.

We are doing a major upgrade of lantern with improved storage layer (see PR1, PR2).
I think pgvector will soon release parallel index builds, which we can compare to our external parallel index builds. pgvector has merged improvements on this front as well since we last benchmarked it and plan to rerun all those benchmarks after their next release.

So, we are working on improvements but I won't promise anything here and will let the benchmarks speak for themselves once they are out (will be within 10 days).

In particular, we are working on supporting the new hardware optimizations from usearch, enabling cpu-specific build flags, adding vector quantization techniques.

In the meantime, here are some of the reasons you might still want to consider lantern:

  • Seamless embedding genration and maintenance - you just insert your text or image data into postgres, we create and maintain corresponding embeddings using one of supported open-source of proprietary embedding models
  • External index generation which builds the index in parallel, outside of the DB machine and does not hog down its resources (can be done with a couple of clicks in our cloud!
  • HNSW Index tuning experiments

@valentijnvenus
Copy link

@Ngalstyan4 thanks for the update:

In particular, we are working on supporting the new hardware optimizations from usearch, enabling cpu-specific build flags, adding vector quantization techniques.

I guess it is not as simple as updating the third_partry folder to point to a recent USearch?

@Ngalstyan4
Copy link
Contributor

Right.
For details on what it involves, you can look at unum-cloud/usearch#335 adding storage interface, and #262, building on that interface

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants