Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Dandi export and read #956

Open
wants to merge 30 commits into
base: master
Choose a base branch
from
Open

Dandi export and read #956

wants to merge 30 commits into from

Conversation

samuelbray32
Copy link
Collaborator

@samuelbray32 samuelbray32 commented May 7, 2024

Description

Partial steps for #861

  • Step (3)

    • makes new table DandiPath
      • tracks translations between analysis file name and renamed path in a dandiset
      • has method to stream data from dandi
    • makes function Export.compile_dandiset()
      • Uses dandi api functions to get dandi-compliant names for files
      • organizes and uploads files to dandi
      • populates DandiPath
    • makes admin function Export._prepare_files_for_export()
  • Step (4)

    • adds new fallback to fetch_nwb which streams file from dandi if available

Checklist:

  • This PR should be accompanied by a release: unsure
  • If release, I have updated the CITATION.cff
  • This PR makes edits to table definitions: no
  • N/A If table edits, I have included an alter snippet for release notes.
  • I have updated the CHANGELOG.md with PR number and description.
  • I have added/edited docs/notebooks to reflect the changes
  • Change the dandi url's from development to production server

@edeno
Copy link
Collaborator

edeno commented May 8, 2024

I know this is still a WIP, but we should also decide whether to make dandi a dependency or an optional dependency.

definition = """
-> Export.File
---
dandiset_id: str
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is the dandiset_id the same for all files in a given Export? If so, I might avoid some redundancy by having DandiPath be a master table with one part per Export.File

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As written, the dandiset ID is common for everything in export. However I could see a future case where an analysis file is used in multiple papers and rather than re-uploading a new copy of the file to dandi you would point to it's existing version in a different dandiset. I might lean towards leaving the flexibility for this in the table structure

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How does dandi handle this case? Presumably, they wouldn't want to store two copies of the same data file across dandisets. But, if the name is contextually determined by other files, the name would have to differ. I wouldn't be surprised if they had cross-dandiset soft-link tool, but I don't know for sure

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would have different names on Dandi, but they would share the same object_id for the file. I haven't tested if that blocks upload.

Copy link
Member

@CBroz1 CBroz1 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just took a quick first pass to keep up to speed

src/spyglass/common/common_usage.py Outdated Show resolved Hide resolved
src/spyglass/common/common_dandi.py Outdated Show resolved Hide resolved
src/spyglass/sharing/sharing_kachery.py Outdated Show resolved Hide resolved
src/spyglass/sharing/sharing_kachery.py Outdated Show resolved Hide resolved
src/spyglass/utils/nwb_helper_fn.py Outdated Show resolved Hide resolved
Copy link

Check out this pull request on  ReviewNB

See visual diffs & provide feedback on Jupyter Notebooks.


Powered by ReviewNB

@samuelbray32 samuelbray32 marked this pull request as ready for review May 20, 2024 21:17
@edeno edeno requested a review from CBroz1 May 20, 2024 21:18
@samuelbray32
Copy link
Collaborator Author

I'm leaving the functions pointing to the Dandi dev server for now until the other parts of the code are reviewed.

Copy link
Member

@CBroz1 CBroz1 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good work, @samuelbray32. I had some comments, but I think the heaviest lifting would be migrating it all to common_dandi to revise the dependency order. I'm happy to help migrating to a more object-oriented approach so the helpers can share properties

notebooks/py_scripts/05_Export.py Outdated Show resolved Hide resolved
notebooks/py_scripts/05_Export.py Show resolved Hide resolved
# >
# >To aid in correcting common formatting errors identified with changes in dandi standards, we have included the method
# ```
# Export()._prepare_files_for_export(paper_key)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If it's a func we think users will adopt, I'd remove the _ prefix that implies 'this is just for other funcs to call'

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was using this to try to indicate it's function that shouldn't be used frequently since it requires admin permissions. Open to other ideas of how to indicate this or can remove

notebooks/py_scripts/05_Export.py Outdated Show resolved Hide resolved
src/spyglass/common/common_dandi.py Outdated Show resolved Hide resolved
# make a temp dir with symbolic links to the export files
source_files = (self.File() & key).fetch("file_path")
paper_dir = f"{export_dir}/{paper_id}"
os.makedirs(paper_dir, exist_ok=True)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If replacing os with pathlib, you can pathlib_dir.mkdir(parents=True, exist_ok=True)

src/spyglass/common/common_usage.py Outdated Show resolved Hide resolved
src/spyglass/common/common_usage.py Outdated Show resolved Hide resolved
src/spyglass/common/common_usage.py Outdated Show resolved Hide resolved
translations = [
{
**(
self.File() & key & f"file_path LIKE '%{t['filename']}'"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My file_like helper would work here i think

@samuelbray32
Copy link
Collaborator Author

Thanks @CBroz1. I can work on moving the functionality into dandi_common. For another common comment, dandi is already a dependency in pynwb so it shouldn't cause issues for import

@rly
Copy link
Collaborator

rly commented May 22, 2024

FYI dandi is not a dependency of pynwb.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants