Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Output paths from reader tasks gets moved and rewritten #751

Open
ghisvail opened this issue Apr 19, 2024 · 2 comments
Open

Output paths from reader tasks gets moved and rewritten #751

ghisvail opened this issue Apr 19, 2024 · 2 comments
Labels
bug Something isn't working

Comments

@ghisvail
Copy link
Collaborator

I have implemented a task which reads a bunch of files from a BIDS dataset, with the following signatures:

@task
@annotate({"return": {
    "dataset_description": dict,
    "participant_ids": list[str],
    "session_ids": list[str],
}})
def read_bids_dataset(dataset_path: Path):
    ...

@task
@annotate({"return": {"files": list[Path]}})
def read_bids_files(
    dataset_path: Path,
    participant_id: str,
    session_id: str,
    datatype: str,
    suffix: str,
    extension: str,
):
    ...

# Build workflow composing the two tasks above
def build_bids_reader(bids_queries: dict, **kwargs) -> Workflow:
    ...

If I sequence both tasks manually, I get the list of BIDS files from the source path as expected.

If I compose them in a workflow, I still get the BIDS files but moved to the workflow working directory.

I have never witnessed that behavior before, and believe this may be a regression compared to versions of Pydra prior to 0.23. In my opinion, results obtained from the sequential task execution and the workflow should be equivalent. Besides, copying the BIDS files can become a big problem if the dataset in huge in terms of number of participant / session combinations, or if the queried modality features large volume data, such as DWI.

A quick debug session indicates that this area of the code may be at cause.

@tclose
Copy link
Contributor

tclose commented Apr 19, 2024

So is the problem that relative paths are treated as being relative to the internal working directory instead of the working directory the workflow is launched from, or are absolute paths also being treated as relative to the internal directory?

@ghisvail
Copy link
Collaborator Author

So is the problem that relative paths are treated as being relative to the internal working directory instead of the working directory the workflow is launched from, or are absolute paths also being treated as relative to the internal directory?

The former I believe. Absolute paths should be untouched, relative paths (possibly generated by the task) should be turned absolute using the current copy mechanism to the task or workflow directory. This way files always get passed as absolute paths between tasks or workflows, which avoids potentially expensive copies.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants