Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adding vector search #132

Merged
merged 14 commits into from
May 22, 2024
Merged

Adding vector search #132

merged 14 commits into from
May 22, 2024

Conversation

programmatix
Copy link
Contributor

No description provided.

@programmatix
Copy link
Contributor Author

This is currently uploaded just for initial review by the SDK PM(s). No review is requested from the wider team at this time.

@programmatix
Copy link
Contributor Author

Review from the wider SDK team is now welcome following initial PM review.

## Vector search
This is a feature being added to Couchbase Server 7.6 in the FTS service.

The feature will initially be released in a Private Preview, and all SDK additions related to it should be annotated with the platform equivalent of @Stability.Volatile as changes may well be required following user feedback.
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I appreciate the Private Preview may be no more, and that this proposed stability level of the API is now somewhat controversial as a result. Personally I don't feel we should be adding a brand-new API for a potentially volatile feature directly at committed level.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agree. Feature is definitely not private (it is public but could be be Beta or GA). Regardless as per our discussions, we should still mark APIs as volatile.


Important SDK considerations:

* There is some uncertainty on whether it is mandatory to provide a `query` field containing a 'normal' FTS query. This is being followed up with the FTS team. As it makes sense to be able to perform solely a vector search, we will assume for now that will be the case.
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jon-strabala can you confirm?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Raised this as a ticket as I've been trying to discover this info since 20th Dec and we now urgently need a response to this.


* Performing a vector search in isolation (without a traditional FTS query).
* A SQL++ statement is used to also perform a vector search.
* This is just passed in the string statement and is expected to return the same query results JSON as at present, so from the SDK POV it's just a passthrough with no SDK modifications needed.
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jon-strabala can you confirm that the query results will be the same in this mode?

rfc/0052-sdk3-full-text-search.md Outdated Show resolved Hide resolved
rfc/0052-sdk3-full-text-search.md Outdated Show resolved Hide resolved
rfc/0052-sdk3-full-text-search.md Show resolved Hide resolved
### VectorQuery
`VectorQuery` creation is platform-idiomatic:

`VectorQuery.create(String vectorFieldName, float[] vector)`
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In order to be more specific and support potential integrations with vector libraries, why not VectorQuery.fromVector?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's mentioned that construction is platform-idiomatic and this is intended to include the "VectorQuery.create(...)" naming. If new VectorQuery(...), or VectorQuery.fromX(...), or some other form, is more idiomatic and better fits the SDK's existing FTS creation methods, it should use such.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For SDKs which use them, is this float64 or float32?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

float32


The first parameter needs to be mandatory.

`VectorSearch.create(VectorQuery vectorQuery, [VectorSearchOptions options])` can be added to make it easier to send a single `VectorQuery`.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why does this accept VectorSearchOptions (which only has the operator, which would be unused with a single VectorQuery object)? Relatedly, in the case of multiple queries, should the operator being used between them not be a more high-level object rather than being part of general options? For instance:

VectorSearch.fromMultiple(VectorQueryCombination, List<VectorQuery>)
VectorSearch.create(VectorQuery)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why does this accept VectorSearchOptions (which only has the operator, which would be unused with a single VectorQuery object)

It's forward-looking to when more options will be present in future, but yes, it's not useful right now. Since discovering what a hard time some SDKs have with adding overloads later, I'm trying to anticipate these situations.
I don't have strong feelings on it - can certainly remove it from this optional sugar.

I'm not a big fan of the fromMultiple / VectorQueryCombination idea, I think, for similar reasons - there may well be future options.

SearchRequest request = SearchRequest
.searchQuery(SearchQuery.matchAll())
.vectorSearch(VectorSearch.create(VectorQuery.create("vector_field", vector)));

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are we attached to searchQuery for historical reasons (as in what we have today so less ).It would be ideal to use ftsSearch and ftsQuery to align with vectorSearch/vectorQuery respectively- no?
A new developer reading the API without any of the context would find it hard to understand that searchQuery is actually FTS.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Possibly - Ray and I have certainly wondered before why it's named "search" and not "fullTextSearch" or similar - but it potentially opens a rather large can of worms. .ftsSearch(SearchQuery.matchAll()) or .fullTextSearch(SearchQuery.matchAll()) both look weirdly inconsistent to me - would probably want something instead like .fullTextSearch(FullTextSearch[Query].matchAll()), which looks cleaner. But we have a lot of classes implementing that SearchQuery interface, so that's potentially an expensive addition - some SDKs may end up having to C&P a bunch.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1 to what @rajagp said.n It would be great to have consistency in function names ftsQuery, geoDistance, vectorQuery, etc.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For the existing SearchQuery including geo I've used the existing names so the large corpus of existing SDK code based on it can be refused. Plus it is somewhat orthogonal to the vector search work, and I didn't want to unduly scope creep given the timescales required.

It's worth noting that this proposal was originally presented for PM review last year, and we are due to begin implementing tomorrow. I can still make small bug fixes but it is too late for feedback that will require such large ranging changes - unless everything is pushed back by a sprint. Let me know if you want to go that path; I'm happy to discuss renaming if given time to do it, though will need details on the desired names from PMs.

### VectorSearch
`VectorSearch` construction is platform-idiomatic:

`VectorSearch.create(List<VectorQuery> vectorQueries, [VectorSearchOptions options])`
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why isn't this also a fluent style API? Where there would be no VectorSearchOptions but myVectorSearch.vectorQueryCombination(...) instead.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ideally it would not use fluent-style anywhere: VectorQuery + VectorQueryOptions, and VectorSearch + VectorSearchOptions are the ideal, as they are more idiomatic to SDK3.

But, the Java SDK for some reason uses a fluent-style API for SearchQuerys, but an options block for SearchOptions. Don't ask me why! So I've tried to give enough latitude in the RFC for people to be able to implement something that ideally looks like an SDK3 API, but can instead fit the idiomatic style where needed.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PS for the creation of the SearchRequest, fluent-style is recommended. This is "A and/or B [and/or C, D...] major top-level feature" is a new concept to the SDKs AFAIK, and fluent-style seems the best way of handling it.

"search": ["vectorSearch"]
}
```
If it is not present the SDK will raise `FeatureNotAvailableException` with a message along the lines of "Vector queries are available from Couchbase Server 7.6.0 and above".
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will the clusterCaps included vectorSearch and scopedSearchIndex on 7.6.0+ if there are no search nodes?

If it is not present the SDK will raise `FeatureNotAvailableException` with a message along the lines of "Vector queries are available from Couchbase Server 7.6.0 and above".

The SDK will check for this capability in the most recent config it received.
If it does not currently have one (in which case one should already be in the process of being asynchronously fetched), the SDK will wait for it to be available.
Copy link
Contributor

@RiPont RiPont Feb 21, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I feel like waiting for the config adds edge cases and potential performance problems for not much benefit.

If there is no config, chances are nothing is bootstrapped yet and the whole thing will fail anyways.

@chvck
Copy link
Contributor

chvck commented May 21, 2024

@programmatix could you tidy the commit history a bit please?

@dnault dnault self-requested a review May 21, 2024 19:04
@programmatix programmatix merged commit ea669e1 into couchbaselabs:master May 22, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

8 participants