Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Check for mapping conflicts for Files and Directories in the actual PathMapper #4888

Open
wants to merge 41 commits into
base: master
Choose a base branch
from

Conversation

adamnovak
Copy link
Member

This should fix #4864's problem by making the CWL _2 name generation actually go on to _3, _4, etc.

It should also make it apply to directories, by applying it before we actually do the file or directory specific logic, at the point where we generate the target path.

It should also make it properly distinguish between re-mapping the same thing and mapping a different thing with the same name.

Changelog Entry

To be copied to the draft changelog by merger:

  • toil-cwl-runner can now handle more than 2 files with the same name, or any number of directories with the same name.

Reviewer Checklist

  • Make sure it is coming from issues/XXXX-fix-the-thing in the Toil repo, or from an external repo.
    • If it is coming from an external repo, make sure to pull it in for CI with:
      contrib/admin/test-pr otheruser theirbranchname issues/XXXX-fix-the-thing
      
    • If there is no associated issue, create one.
  • Read through the code changes. Make sure that it doesn't have:
    • Addition of trailing whitespace.
    • New variable or member names in camelCase that want to be in snake_case.
    • New functions without type hints.
    • New functions or classes without informative docstrings.
    • Changes to semantics not reflected in the relevant docstrings.
    • New or changed command line options for Toil workflows that are not reflected in docs/running/{cliOptions,cwl,wdl}.rst
    • New features without tests.
  • Comment on the lines of code where problems exist with a review comment. You can shift-click the line numbers in the diff to select multiple lines.
  • Finish the review with an overall description of your opinion.

Merger Checklist

  • Make sure the PR passes tests.
  • Make sure the PR has been reviewed since its last modification. If not, review it.
  • Merge with the Github "Squash and merge" feature.
    • If there are multiple authors' commits, add Co-authored-by to give credit to all contributing authors.
  • Copy its recommended changelog entry to the Draft Changelog.
  • Append the issue number in parentheses to the changelog entry.

…athMapper

This should fix #4864's problem by making the CWL `_2` name generation actually go on to `_3`, `_4`, etc.

It should also make it apply to directories, by applying it before we actually do the file or directory specific logic, at the point where we generate the target path.

It should also make it properly distinguish between re-mapping the same thing and mapping a different thing with the same name.
@adamnovak adamnovak mentioned this pull request Apr 29, 2024
19 tasks
@adamnovak adamnovak requested a review from mr-c April 29, 2024 14:46
Copy link
Contributor

@mr-c mr-c left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for looking at this @adamnovak. Looks like this PR is moving code around from cwltool.process.stage_files ; Maybe it would be better to leave that out and adjust toilStageFiles to use cwltool.process.relocateOutputs ? I'm happy to add docs and/or refactor that function to be more useful to toil-cwl-runner.

@adamnovak
Copy link
Member Author

I'm not pulling any new code from cwltool.process.stage_files here, but it looks like the code I'm moving was originally there, and it got pulled into Toil in a non-functional way.

I don't feel confident in my ability to refactor toilStageFiles to call cwltool.process.relocateOutputs without a strong description of what the semantics of that function are, exactly. What is it supposed to be relocating, and from where to where, and why? Does that function actually guarantee that something sensible is done with files or directories with conflicting names? Can it support a destination_path that is only accessible through the StdFsAccess implementation, like Toil would need to be able to write to S3?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants