Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Vector Store: Provide distance functions as scalar functions #15835

Open
BaurzhanSakhariev opened this issue Apr 12, 2024 · 3 comments
Open

Vector Store: Provide distance functions as scalar functions #15835

BaurzhanSakhariev opened this issue Apr 12, 2024 · 3 comments
Labels
feature: sql: scalars needs concrete use-case needs upvotes Please use the reaction feature on the issue to signal your interest. This helps us prioritize

Comments

@BaurzhanSakhariev
Copy link
Contributor

BaurzhanSakhariev commented Apr 12, 2024

Problem Statement

Some applications use distance between vectors to get nearest neighbours.

pg_vector example suggests to use Euclidean distance (<-> operator, see operators) to get nearest neighbours.

Get the nearest neighbors by L2 distance
SELECT * FROM items ORDER BY embedding <-> '[3,1,2]' LIMIT 5;

In general, computing distance between vectors is a valid use case (similarity case aside) and would avoid UDF-function workaround.

See also: https://surrealdb.com/blog/whats-new-for-developers-in-surrealdb-beta-10, https://github.com/pgvector/pgvector?tab=readme-ov-file#vector-functions

Some applications require similarity score (value between [0-1]). This is tracked in #14801

Possible Solutions

Expose scalar functions for different types of distances (euclidean, manhattan, cosine...)

Considered Alternatives

@mfussenegger
Copy link
Member

SELECT * FROM items ORDER BY embedding <-> '[3,1,2]' LIMIT 5;

If the result isn't used, this would likely just be a slower version of the existing knn_match

@mfussenegger mfussenegger added feature: sql: scalars needs upvotes Please use the reaction feature on the issue to signal your interest. This helps us prioritize needs concrete use-case labels Apr 15, 2024
@ckurze
Copy link

ckurze commented Apr 19, 2024

I think this is the same/similar as #15768? At the moment, we use Euclidian Distance per default, but users might want to have Cosine and Dot Product due to a different approach to calculate the similarity.

@BaurzhanSakhariev
Copy link
Contributor Author

BaurzhanSakhariev commented Apr 19, 2024

I think this is the same/similar as #15768?

Here we track actual distance support. Similarity, is F(distance) with values in range 0-1. Actual distance also can be used to evaluate similarity, see pg_vector example above, different use case.

I think #15768 is actually duplicate of #14801.

However, #14801 has been closed with "only euclidean based" (comment), so #15768 covers remaining (cosine, dot product) based similarities

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature: sql: scalars needs concrete use-case needs upvotes Please use the reaction feature on the issue to signal your interest. This helps us prioritize
Projects
None yet
Development

No branches or pull requests

3 participants