Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

integrations: LanceDB update #13416

Closed
wants to merge 22 commits into from

Conversation

raghavdixit99
Copy link
Contributor

@raghavdixit99 raghavdixit99 commented May 10, 2024

Feature update :

  • LanceDB Cloud support
  • from_table - creates index from existing table
  • metadata filtering logic
  • hybrid search w reranking support
  • create vector as well as scalar index on table

Updated demo notebook ( tested locally ) and tested via make commands.

@dosubot dosubot bot added the size:L This PR changes 100-499 lines, ignoring generated files. label May 10, 2024
Copy link

Check out this pull request on  ReviewNB

See visual diffs & provide feedback on Jupyter Notebooks.


Powered by ReviewNB

@raghavdixit99 raghavdixit99 marked this pull request as draft May 11, 2024 06:03
@raghavdixit99 raghavdixit99 marked this pull request as ready for review May 11, 2024 15:18
@raghavdixit99
Copy link
Contributor Author

@ravi03071991 review pls

@raghavdixit99 raghavdixit99 marked this pull request as draft May 23, 2024 19:10
@raghavdixit99 raghavdixit99 marked this pull request as ready for review May 23, 2024 20:50
)

if table:
assert isinstance(table, lancedb.db.LanceTable)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please add a complimentary error msg here

if query_type == "vector":
_query = query.query_embedding
else:
self._table.create_fts_index(self.text_key, replace=True)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This will create index every time query is called, let's keep track of index if that's created in an instance variable and not recreate index every time

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

so this is also done because if the user adds more data, a full reindex is needed from our(lance) end, creating it every time is also not the best way but we would have to handle this in the future.
Resolving it as per your comment as of now thanks.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hmmm..actually just set the flag to False every time table.add is called? just use fts_index_created or something

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes I actually just did that haha, basically setting the instance variable back to default and on query time we are checking the variable status

self._table.create_fts_index(self.text_key, replace=True)
if query_type == "hybrid":
_query = (query.query_embedding, query.query_str)
else:

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

elif query_type="fts" and else: raise ValueError("invalid query_type")


def __init__(
self,
uri: Optional[str],
table_name: str = "vectors",
connection: Optional[Any] = None,

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

will changing the order of args affect backwards compat? If so, let's try to add the new args towards the end?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, this was supposed to be a minor breaking change to synch with other integrations, thanks ill change it back.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

kk sounds good

@raghavdixit99 raghavdixit99 marked this pull request as draft May 24, 2024 15:17
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
size:L This PR changes 100-499 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants