Implement a SQL, Qdrant SQL? #4026

raulcarlomagno · 2024-04-12T11:06:32Z

I think it could be very useful to have a QSQL, or being optimist it could become an standard for vector databases, VSQL?
Instead of dealing with JSON in qdrant http api or all the methods in Python API for example. Queries can be written easily in the same way for scroll api and search api, in plain SQL. It can be useful for the qdrant dashboard or even for production queries.
The generated abstract syntax tree (AST), can match all qdrant features for searching and scrolling.

SELECT score, vector, id, payload.brand_name
FROM mycollection
WHERE vector LIKE [0.12, 0.1, 0.99, -0.01]

field score is a virtual calculated field that is returned when using LIKE for vectors

in the case of having named vectors, it would be like:

SELECT score, vectors.img_vec, id, payload.qty_items
FROM mycollection
WHERE vectors.img_vec LIKE [0, 1.2, -0.2, 0.001]

limit, offset and filtering payload

SELECT score, id, payload.qty_items
FROM mycollection
WHERE vector LIKE [0.1, 0.2, -0.3, 0.11] AND payload.brand_name = 'Nokia'
LIMIT 50
OFFSET 20

all payload and filtering

SELECT score, id, payload.*
FROM mycollection
WHERE vector LIKE [0.11, -0.2, 0.3, 0.22] AND payload.brand_name = 'Nokia'
LIMIT 100

complex filtering

SELECT score, id
FROM mycollection
WHERE vector LIKE [0.01, -0.9, 0.11, 0.0]
AND (payload.brand_name IN ('Nokia', 'Alcatel', 'Sony') OR payload.qty_items >= 10)
AND payload.members IS NULL
LIMIT 100

scroll api (no vector similarity search). Filtering fulltext, values count and range, plus sorting

SELECT id, payload.*
FROM mycollection
WHERE payload.brand_name MATCH 'cell'
AND LENGTH(payload.members) > 100
AND payload.qty_items BETWEEN 10 AND 20
ORDER BY payload.members DESC
LIMIT 100

geo bounding box filtering

SELECT score, id
FROM mycollection
WHERE vector LIKE [0.01, -0.9, 0.11, 0.0]
AND (
payload.geofield INSIDE RECTANGLE(52.520711, 13.403683, 52.495862, 13.455868)
OR
payload.geofield INSIDE CIRCLE(52.520711, 13.403683, 1000)
OR
payload.geofield OUTSIDE POLYGON([12.444, 54.12], [24.77, 18.222], [99.91, 12.2])
)

generall · 2024-04-13T09:08:46Z

Hey @raulcarlomagno, thanks for bringing that up!

I recognize the problem of JSON filters being complicated to write sometimes, especially with explicit pydantic types in python.

I, however, don't think sql-like language is a right answer to this problem, for the following reasons:

Original SQL was designed for a very different purposes

it expects data to be normalized, meaning each table is flatten. This is not the case for most search engines.
vector dbs will support only a subset of original SQL functionality, which will make it basically a syntactic "sugar"
Along with defining what to retrieve, search engines frequently require instructions of how to retrieve something. E.g. by applying re-ranking step of some kind. This is barely compatible with SQL-like languages
It is very hard to programmatically edit and modify sql-like language, compared to json

Instead of doing this, I would rather:

simplify the json language, like allow not to use redundant arrays e.t.c.
introduce a DSL-type of helpers in client libraries
improve autocomplete in web-ui

Please feel free to share any additional arguments why would you think the sql is a better alternative

raulcarlomagno · 2024-04-18T08:46:22Z

thank you for answering @generall
the main reason would be payload filtering, as you said, json gets "complicated" for advanced filtering use cases
when working on jupyter notebooks, analyzing, researching, even in the ui. It is more simple to go back to the old SQL dialect for querying, as we see that payload+vectors are getting much closer in terms of data, at the end, it is a vector DATABASE. So offering an alternative way for querying it can be seen as added value for the product.
maybe a DSL-type helper can be useful
anyways, this is just my humble opinion as a user, like when i asked why IDs can't be strings... :)

if i have time i will try to make an sql parser for the python client, but my initial thought was about having this parser on server side, to send sql queries through http or grpc

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Implement a SQL, Qdrant SQL? #4026

Implement a SQL, Qdrant SQL? #4026

raulcarlomagno commented Apr 12, 2024 •

edited

generall commented Apr 13, 2024

raulcarlomagno commented Apr 18, 2024

Implement a SQL, Qdrant SQL? #4026

Implement a SQL, Qdrant SQL? #4026

Comments

raulcarlomagno commented Apr 12, 2024 • edited

generall commented Apr 13, 2024

raulcarlomagno commented Apr 18, 2024

raulcarlomagno commented Apr 12, 2024 •

edited