-
Notifications
You must be signed in to change notification settings - Fork 16
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Adding vector search #132
Adding vector search #132
Conversation
This is currently uploaded just for initial review by the SDK PM(s). No review is requested from the wider team at this time. |
Review from the wider SDK team is now welcome following initial PM review. |
## Vector search | ||
This is a feature being added to Couchbase Server 7.6 in the FTS service. | ||
|
||
The feature will initially be released in a Private Preview, and all SDK additions related to it should be annotated with the platform equivalent of @Stability.Volatile as changes may well be required following user feedback. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I appreciate the Private Preview may be no more, and that this proposed stability level of the API is now somewhat controversial as a result. Personally I don't feel we should be adding a brand-new API for a potentially volatile feature directly at committed level.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agree. Feature is definitely not private (it is public but could be be Beta or GA). Regardless as per our discussions, we should still mark APIs as volatile.
rfc/0052-sdk3-full-text-search.md
Outdated
|
||
Important SDK considerations: | ||
|
||
* There is some uncertainty on whether it is mandatory to provide a `query` field containing a 'normal' FTS query. This is being followed up with the FTS team. As it makes sense to be able to perform solely a vector search, we will assume for now that will be the case. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jon-strabala can you confirm?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Raised this as a ticket as I've been trying to discover this info since 20th Dec and we now urgently need a response to this.
|
||
* Performing a vector search in isolation (without a traditional FTS query). | ||
* A SQL++ statement is used to also perform a vector search. | ||
* This is just passed in the string statement and is expected to return the same query results JSON as at present, so from the SDK POV it's just a passthrough with no SDK modifications needed. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jon-strabala can you confirm that the query results will be the same in this mode?
rfc/0052-sdk3-full-text-search.md
Outdated
### VectorQuery | ||
`VectorQuery` creation is platform-idiomatic: | ||
|
||
`VectorQuery.create(String vectorFieldName, float[] vector)` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In order to be more specific and support potential integrations with vector libraries, why not VectorQuery.fromVector
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's mentioned that construction is platform-idiomatic and this is intended to include the "VectorQuery.create(...)" naming. If new VectorQuery(...)
, or VectorQuery.fromX(...)
, or some other form, is more idiomatic and better fits the SDK's existing FTS creation methods, it should use such.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For SDKs which use them, is this float64 or float32?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
float32
rfc/0052-sdk3-full-text-search.md
Outdated
|
||
The first parameter needs to be mandatory. | ||
|
||
`VectorSearch.create(VectorQuery vectorQuery, [VectorSearchOptions options])` can be added to make it easier to send a single `VectorQuery`. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why does this accept VectorSearchOptions (which only has the operator, which would be unused with a single VectorQuery object)? Relatedly, in the case of multiple queries, should the operator being used between them not be a more high-level object rather than being part of general options? For instance:
VectorSearch.fromMultiple(VectorQueryCombination, List<VectorQuery>)
VectorSearch.create(VectorQuery)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why does this accept VectorSearchOptions (which only has the operator, which would be unused with a single VectorQuery object)
It's forward-looking to when more options will be present in future, but yes, it's not useful right now. Since discovering what a hard time some SDKs have with adding overloads later, I'm trying to anticipate these situations.
I don't have strong feelings on it - can certainly remove it from this optional sugar.
I'm not a big fan of the fromMultiple
/ VectorQueryCombination idea, I think, for similar reasons - there may well be future options.
SearchRequest request = SearchRequest | ||
.searchQuery(SearchQuery.matchAll()) | ||
.vectorSearch(VectorSearch.create(VectorQuery.create("vector_field", vector))); | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are we attached to searchQuery
for historical reasons (as in what we have today so less ).It would be ideal to use ftsSearch
and ftsQuery
to align with vectorSearch
/vectorQuery
respectively- no?
A new developer reading the API without any of the context would find it hard to understand that searchQuery is actually FTS.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Possibly - Ray and I have certainly wondered before why it's named "search" and not "fullTextSearch" or similar - but it potentially opens a rather large can of worms. .ftsSearch(SearchQuery.matchAll())
or .fullTextSearch(SearchQuery.matchAll())
both look weirdly inconsistent to me - would probably want something instead like .fullTextSearch(FullTextSearch[Query].matchAll())
, which looks cleaner. But we have a lot of classes implementing that SearchQuery
interface, so that's potentially an expensive addition - some SDKs may end up having to C&P a bunch.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1 to what @rajagp said.n It would be great to have consistency in function names ftsQuery, geoDistance, vectorQuery, etc.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For the existing SearchQuery including geo I've used the existing names so the large corpus of existing SDK code based on it can be refused. Plus it is somewhat orthogonal to the vector search work, and I didn't want to unduly scope creep given the timescales required.
It's worth noting that this proposal was originally presented for PM review last year, and we are due to begin implementing tomorrow. I can still make small bug fixes but it is too late for feedback that will require such large ranging changes - unless everything is pushed back by a sprint. Let me know if you want to go that path; I'm happy to discuss renaming if given time to do it, though will need details on the desired names from PMs.
### VectorSearch | ||
`VectorSearch` construction is platform-idiomatic: | ||
|
||
`VectorSearch.create(List<VectorQuery> vectorQueries, [VectorSearchOptions options])` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why isn't this also a fluent style API? Where there would be no VectorSearchOptions but myVectorSearch.vectorQueryCombination(...) instead.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ideally it would not use fluent-style anywhere: VectorQuery + VectorQueryOptions, and VectorSearch + VectorSearchOptions are the ideal, as they are more idiomatic to SDK3.
But, the Java SDK for some reason uses a fluent-style API for SearchQuery
s, but an options block for SearchOptions. Don't ask me why! So I've tried to give enough latitude in the RFC for people to be able to implement something that ideally looks like an SDK3 API, but can instead fit the idiomatic style where needed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
PS for the creation of the SearchRequest, fluent-style is recommended. This is "A and/or B [and/or C, D...] major top-level feature" is a new concept to the SDKs AFAIK, and fluent-style seems the best way of handling it.
…ds with the same name
"search": ["vectorSearch"] | ||
} | ||
``` | ||
If it is not present the SDK will raise `FeatureNotAvailableException` with a message along the lines of "Vector queries are available from Couchbase Server 7.6.0 and above". |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will the clusterCaps included vectorSearch and scopedSearchIndex on 7.6.0+ if there are no search nodes?
If it is not present the SDK will raise `FeatureNotAvailableException` with a message along the lines of "Vector queries are available from Couchbase Server 7.6.0 and above". | ||
|
||
The SDK will check for this capability in the most recent config it received. | ||
If it does not currently have one (in which case one should already be in the process of being asynchronously fetched), the SDK will wait for it to be available. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I feel like waiting for the config adds edge cases and potential performance problems for not much benefit.
If there is no config, chances are nothing is bootstrapped yet and the whole thing will fail anyways.
@programmatix could you tidy the commit history a bit please? |
No description provided.