Vector Store: Provide distance functions as scalar functions #15835

BaurzhanSakhariev · 2024-04-12T08:15:01Z

Problem Statement

Some applications use distance between vectors to get nearest neighbours.

pg_vector example suggests to use Euclidean distance (<-> operator, see operators) to get nearest neighbours.

Get the nearest neighbors by L2 distance
SELECT * FROM items ORDER BY embedding <-> '[3,1,2]' LIMIT 5;

In general, computing distance between vectors is a valid use case (similarity case aside) and would avoid UDF-function workaround.

See also: https://surrealdb.com/blog/whats-new-for-developers-in-surrealdb-beta-10, https://github.com/pgvector/pgvector?tab=readme-ov-file#vector-functions

Some applications require similarity score (value between [0-1]). This is tracked in #14801

Possible Solutions

Expose scalar functions for different types of distances (euclidean, manhattan, cosine...)

Considered Alternatives

Implement UDF functions
Use similarity scalar (Vector Store: Provide similarity distance functions as scalar functions #14801) and derive distance by inverting formulas from https://github.com/apache/lucene/blob/main/lucene/core/src/java/org/apache/lucene/index/VectorSimilarityFunction.java

mfussenegger · 2024-04-15T08:16:48Z

SELECT * FROM items ORDER BY embedding <-> '[3,1,2]' LIMIT 5;

If the result isn't used, this would likely just be a slower version of the existing knn_match

ckurze · 2024-04-19T09:32:00Z

I think this is the same/similar as #15768? At the moment, we use Euclidian Distance per default, but users might want to have Cosine and Dot Product due to a different approach to calculate the similarity.

BaurzhanSakhariev · 2024-04-19T10:06:00Z

I think this is the same/similar as #15768?

Here we track actual distance support. Similarity, is F(distance) with values in range 0-1. Actual distance also can be used to evaluate similarity, see pg_vector example above, different use case.

I think #15768 is actually duplicate of #14801.

However, #14801 has been closed with "only euclidean based" (comment), so #15768 covers remaining (cosine, dot product) based similarities

This was referenced Apr 12, 2024

support euclidean distance function #6910

Closed

Add vector_similarity scalar function (euclidean based) #15832

Merged

mfussenegger added feature: sql: scalars needs upvotes Please use the reaction feature on the issue to signal your interest. This helps us prioritize needs concrete use-case labels Apr 15, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Vector Store: Provide distance functions as scalar functions #15835

Vector Store: Provide distance functions as scalar functions #15835

BaurzhanSakhariev commented Apr 12, 2024 •

edited

mfussenegger commented Apr 15, 2024

ckurze commented Apr 19, 2024

BaurzhanSakhariev commented Apr 19, 2024 •

edited

Vector Store: Provide distance functions as scalar functions #15835

Vector Store: Provide distance functions as scalar functions #15835

Comments

BaurzhanSakhariev commented Apr 12, 2024 • edited

Problem Statement

Possible Solutions

Considered Alternatives

mfussenegger commented Apr 15, 2024

ckurze commented Apr 19, 2024

BaurzhanSakhariev commented Apr 19, 2024 • edited

BaurzhanSakhariev commented Apr 12, 2024 •

edited

BaurzhanSakhariev commented Apr 19, 2024 •

edited