Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement a SQL, Qdrant SQL? #4026

Open
raulcarlomagno opened this issue Apr 12, 2024 · 2 comments
Open

Implement a SQL, Qdrant SQL? #4026

raulcarlomagno opened this issue Apr 12, 2024 · 2 comments

Comments

@raulcarlomagno
Copy link

raulcarlomagno commented Apr 12, 2024

I think it could be very useful to have a QSQL, or being optimist it could become an standard for vector databases, VSQL?
Instead of dealing with JSON in qdrant http api or all the methods in Python API for example. Queries can be written easily in the same way for scroll api and search api, in plain SQL. It can be useful for the qdrant dashboard or even for production queries.
The generated abstract syntax tree (AST), can match all qdrant features for searching and scrolling.

SELECT score, vector, id, payload.brand_name
FROM mycollection
WHERE vector LIKE [0.12, 0.1, 0.99, -0.01]

field score is a virtual calculated field that is returned when using LIKE for vectors

in the case of having named vectors, it would be like:

SELECT score, vectors.img_vec, id, payload.qty_items
FROM mycollection
WHERE vectors.img_vec LIKE [0, 1.2, -0.2, 0.001]

limit, offset and filtering payload

SELECT score, id, payload.qty_items
FROM mycollection
WHERE vector LIKE [0.1, 0.2, -0.3, 0.11] AND payload.brand_name = 'Nokia'
LIMIT 50
OFFSET 20

all payload and filtering

SELECT score, id, payload.*
FROM mycollection
WHERE vector LIKE [0.11, -0.2, 0.3, 0.22] AND payload.brand_name = 'Nokia'
LIMIT 100

complex filtering

SELECT score, id
FROM mycollection
WHERE vector LIKE [0.01, -0.9, 0.11, 0.0]
AND (payload.brand_name IN ('Nokia', 'Alcatel', 'Sony') OR payload.qty_items >= 10)
AND payload.members IS NULL
LIMIT 100

scroll api (no vector similarity search). Filtering fulltext, values count and range, plus sorting

SELECT id, payload.*
FROM mycollection
WHERE payload.brand_name MATCH 'cell'
AND LENGTH(payload.members) > 100
AND payload.qty_items BETWEEN 10 AND 20
ORDER BY payload.members DESC
LIMIT 100

geo bounding box filtering

SELECT score, id
FROM mycollection
WHERE vector LIKE [0.01, -0.9, 0.11, 0.0]
AND (
payload.geofield INSIDE RECTANGLE(52.520711, 13.403683, 52.495862, 13.455868)
OR
payload.geofield INSIDE CIRCLE(52.520711, 13.403683, 1000)
OR
payload.geofield OUTSIDE POLYGON([12.444, 54.12], [24.77, 18.222], [99.91, 12.2])
)
@generall
Copy link
Member

Hey @raulcarlomagno, thanks for bringing that up!

I recognize the problem of JSON filters being complicated to write sometimes, especially with explicit pydantic types in python.

I, however, don't think sql-like language is a right answer to this problem, for the following reasons:

Original SQL was designed for a very different purposes

  • it expects data to be normalized, meaning each table is flatten. This is not the case for most search engines.
  • vector dbs will support only a subset of original SQL functionality, which will make it basically a syntactic "sugar"
  • Along with defining what to retrieve, search engines frequently require instructions of how to retrieve something. E.g. by applying re-ranking step of some kind. This is barely compatible with SQL-like languages
  • It is very hard to programmatically edit and modify sql-like language, compared to json

Instead of doing this, I would rather:

  • simplify the json language, like allow not to use redundant arrays e.t.c.
  • introduce a DSL-type of helpers in client libraries
  • improve autocomplete in web-ui

Please feel free to share any additional arguments why would you think the sql is a better alternative

@raulcarlomagno
Copy link
Author

thank you for answering @generall
the main reason would be payload filtering, as you said, json gets "complicated" for advanced filtering use cases
when working on jupyter notebooks, analyzing, researching, even in the ui. It is more simple to go back to the old SQL dialect for querying, as we see that payload+vectors are getting much closer in terms of data, at the end, it is a vector DATABASE. So offering an alternative way for querying it can be seen as added value for the product.
maybe a DSL-type helper can be useful
anyways, this is just my humble opinion as a user, like when i asked why IDs can't be strings... :)

if i have time i will try to make an sql parser for the python client, but my initial thought was about having this parser on server side, to send sql queries through http or grpc

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants