Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support FullText search on SQLite #723

Merged
merged 20 commits into from
May 25, 2024
Merged

Conversation

danielballan
Copy link
Member

@danielballan danielballan commented Apr 19, 2024

This PR has commits from @Kezzsim and me. It adds support for full text search backed by SQLite.

SQLite full text search support via FTS5 is solid and feature-ful, but the implementation is somewhat different. It employs a VIRTUAL TABLE instead of an INDEX. Database triggers are used to keep the content of this table in sync with the nodes table. We must perform a JOIN on that virtual table in order to filter our results with a MATCH query against the indexed node metadata.

We get queries that look like:

SELECT nodes."key" 
FROM nodes JOIN metadata_fts5 ON metadata_fts5.rowid = nodes.id 
WHERE nodes.ancestors = ? AND metadata_fts5.metadata MATCH ? ORDER BY nodes.time_created, nodes.id
 LIMIT ? OFFSET ?

(as shown by setting TILED_ECHO_SQL=1 when starting the server).

I think the implementation is about as clean as it can be, until we go and refactor all the SQLAlchemy code into a separate module. But I'd like to kick the tires a bit more before we merge.

We also need to test the migration interactively, and particularly ensure that existing rows in the nodes table are added to the VIRTUAL TABLE.

Closes #546

@danielballan
Copy link
Member Author

danielballan commented May 14, 2024

Locally, I am seeing these test failures. I think there is something race-y about table creation. The failures are not all reproducible if run in isolation.

FAILED tiled/_tests/test_access_control.py::test_create_and_update_allowed - sqlalchemy.exc.OperationalError: (sqlite3.OperationalError) no such table: main.metadata_ft5_index
FAILED tiled/_tests/test_object_cache.py::test_detect_content_changed_or_removed - assert 1 == 2
FAILED tiled/_tests/test_queries.py::test_full_text[sqlite] - tiled.client.utils.ClientError: 400: The query type 'full_text' is not supported on this node. http://local-tiled-app/api/v...
FAILED tiled/_tests/test_writing.py::test_write_dataframe_partitioned - dask.base.TokenizationError: Object <function Container.write_dataframe.<locals>.write_partition at 0x7a554502b420> cannot ...
FAILED tiled/_tests/test_writing.py::test_limits - sqlalchemy.exc.OperationalError: (sqlite3.OperationalError) no such table: main.metadata_ft5_index
FAILED tiled/_tests/test_writing.py::test_metadata_revisions - sqlalchemy.exc.OperationalError: (sqlite3.OperationalError) no such table: main.metadata_ft5_index

@danielballan
Copy link
Member Author

Some of the above failures are unrelated, and simply due to this branch being old and carrying some upstream incompatibility issues that have been fixed on main. I have rebased on main.

@danielballan danielballan force-pushed the smse_sqlite branch 2 times, most recently from 0b0ac2e to f799703 Compare May 14, 2024 20:50
@danielballan
Copy link
Member Author

danielballan commented May 14, 2024

Unit tests pass locally for me now.

  • The migration tiled/catalog/migrations/versions/ed3a4223a600_create_sqlite_table_for_fulltext_search.py must be extended to index all the existing rows in nodes at the time at the migration is running, effectively like running the trigger retroactively for all the rows that were added before we starting indexing them.
  • Test for each of the triggers
    • A node deleted no longer appears in full text search results.
    • A node that has had its metadata updated matches its new value but not its old value.

"""
INSERT INTO metadata_fts5(rowid, metadata)
SELECT id, metadata FROM nodes;
"""
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Missing trialing comma after close-quotes.

@danielballan
Copy link
Member Author

For reference I wanted to drop this link here --- this is where I found the approach we settled on for creating a VIRTUAL TABLE via SQLAlchemy: sqlalchemy/sqlalchemy#9466

@danielballan danielballan force-pushed the smse_sqlite branch 3 times, most recently from f1e8be8 to 7eeb68f Compare May 25, 2024 11:01
@danielballan danielballan merged commit cdee371 into bluesky:main May 25, 2024
9 checks passed
@danielballan danielballan deleted the smse_sqlite branch May 25, 2024 11:22
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants