New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add support for more input sources in DirectoryReader #283
Add support for more input sources in DirectoryReader #283
Conversation
…utils in DirectoryReader
This pull request has been linked to Shortcut Story #40938: Object API: Add support for importing from group of TileDB Files. |
This pull request has been linked to Shortcut Story #42043: Trigger Task Graph for Indexing (Ingestion). |
apis/python/src/tiledb/vector_search/object_readers/directory_reader.py
Outdated
Show resolved
Hide resolved
apis/python/src/tiledb/vector_search/object_readers/directory_reader.py
Outdated
Show resolved
Hide resolved
apis/python/src/tiledb/vector_search/object_readers/directory_reader.py
Outdated
Show resolved
Hide resolved
apis/python/src/tiledb/vector_search/object_readers/directory_reader.py
Outdated
Show resolved
Hide resolved
apis/python/src/tiledb/vector_search/object_readers/directory_reader.py
Outdated
Show resolved
Hide resolved
apis/python/src/tiledb/vector_search/object_readers/directory_reader.py
Outdated
Show resolved
Hide resolved
apis/python/src/tiledb/vector_search/object_readers/directory_reader.py
Outdated
Show resolved
Hide resolved
# Stop if we have found max_count files | ||
if max_files is not None and len(results) >= max_files: | ||
return | ||
obj_type = tiledb.object_type(uri) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This needs a scoped context. Likely taking a context as a function parameter would work to pass it as a scoped ctx.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Removed all config and context passing from DirectoryReaders
.
Readers shouldn't own or manage the TileDB configuration. Rather the config is passed through the ingestion or index functions. Some reason for this:
- Reader arguments are serialized and stored in group metadata. Serializing TileDB config would present a risk of exposing user secrets.
- Multiple users can use the same index with different configs (i.e. each user having their credentials for s3). The Reader and its arguments are the same for all users of an Index
How it works currently:
- Config is passed to
ObjectIndex
methods (__init__
,create
,update_index
). - We perform all reader operations within
with tiledb.scope_ctx(ctx_or_config=config)
- Readers should expect a
default_ctx
to be set appropriately
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the explanation!
apis/python/src/tiledb/vector_search/object_readers/directory_reader.py
Outdated
Show resolved
Hide resolved
apis/python/src/tiledb/vector_search/object_readers/directory_reader.py
Outdated
Show resolved
Hide resolved
apis/python/src/tiledb/vector_search/object_readers/directory_reader.py
Outdated
Show resolved
Hide resolved
apis/python/src/tiledb/vector_search/object_readers/directory_reader.py
Outdated
Show resolved
Hide resolved
apis/python/src/tiledb/vector_search/object_readers/directory_reader.py
Outdated
Show resolved
Hide resolved
apis/python/src/tiledb/vector_search/object_readers/directory_reader.py
Outdated
Show resolved
Hide resolved
Great work, thanks for the changes! I've left some minor comments, mostly around the context handling. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good to me once comments from Seth are fixed as well.
This adds support for listing TileDB groups
It also uses the same listing argument structure as the utilities used in VCF ingestion
[sc-42043]