Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support for more input sources in DirectoryReader #283

Merged
merged 12 commits into from Mar 27, 2024

Conversation

NikolaosPapailiou
Copy link
Collaborator

@NikolaosPapailiou NikolaosPapailiou commented Mar 14, 2024

This adds support for listing TileDB groups

It also uses the same listing argument structure as the utilities used in VCF ingestion

[sc-42043]

Copy link

Copy link

This pull request has been linked to Shortcut Story #42043: Trigger Task Graph for Indexing (Ingestion).

# Stop if we have found max_count files
if max_files is not None and len(results) >= max_files:
return
obj_type = tiledb.object_type(uri)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This needs a scoped context. Likely taking a context as a function parameter would work to pass it as a scoped ctx.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed all config and context passing from DirectoryReaders.

Readers shouldn't own or manage the TileDB configuration. Rather the config is passed through the ingestion or index functions. Some reason for this:

  • Reader arguments are serialized and stored in group metadata. Serializing TileDB config would present a risk of exposing user secrets.
  • Multiple users can use the same index with different configs (i.e. each user having their credentials for s3). The Reader and its arguments are the same for all users of an Index

How it works currently:

  • Config is passed to ObjectIndex methods (__init__, create, update_index).
  • We perform all reader operations within with tiledb.scope_ctx(ctx_or_config=config)
  • Readers should expect a default_ctx to be set appropriately

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the explanation!

@Shelnutt2
Copy link
Member

Shelnutt2 commented Mar 27, 2024

Great work, thanks for the changes! I've left some minor comments, mostly around the context handling.

Copy link
Contributor

@jparismorgan jparismorgan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me once comments from Seth are fixed as well.

@NikolaosPapailiou NikolaosPapailiou merged commit d8f58b5 into main Mar 27, 2024
6 checks passed
@NikolaosPapailiou NikolaosPapailiou deleted the npapa/sc-40938/group-and-directory-listing branch March 27, 2024 15:32
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants