Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Feature Request] returning packed object instead of dumping to files #78

Open
tom-tan opened this issue Nov 10, 2021 · 5 comments
Open

Comments

@tom-tan
Copy link
Member

tom-tan commented Nov 10, 2021

Currently upgrade_document has the following signature and always dumps the upgraded step objects into outpu_dir.

def upgrade_document(
    document: Any, v1_only: bool, v1_1_only: bool, output_dir: str, imports: Set[str]
) -> Any:

However, it makes harder to integrate other related tools such as cwl_utils.parser.load_document family.
I would be nice if upgrade_document has an option that returns a upgraded packed CWL object instead of directly writing upgraded CWL objects.

@tom-tan tom-tan changed the title [] [Feature Request] returning packed object instead of dumping to files Nov 10, 2021
@mr-c
Copy link
Member

mr-c commented Nov 10, 2021

Sure, I would happily review a PR to accomplish this

@tom-tan
Copy link
Member Author

tom-tan commented Nov 11, 2021

Currently I have a plan to revise the signature as follows.

def upgrade_document(
    document: Any,
    pack: bool,  # return a packed object or do not process external CWL doc in `run` fields
    output_dir: Optional[str] = None, # dump upgraded documents if provided
    target_version: Optional[str] = "latest", # Issue #77
    imports: Optional[Set[str]] = None,
) -> Any:
    ...

or

# omit output_dir option not to write resulted files
def upgrade_document(
    document: Any,
    pack: bool,  # return a packed object or do not process external CWL doc in `run` fields
    target_version: Optional[str] = "latest", # Issue #77
    imports: Optional[Set[str]] = None,
) -> Any:
    ...

I prefer the latter because we can focus on upgrading a given CWL document and easily integrate with other output functions.

@mr-c
Copy link
Member

mr-c commented Nov 11, 2021

I prefer the latter because we can focus on upgrading a given CWL document and easily integrate with other output functions.

What if we're not packing? How are the upgraded documents for each step returned to the caller?

@tom-tan
Copy link
Member Author

tom-tan commented Nov 12, 2021

What if we're not packing?

In my plan, it only upgrades a given CWL document and leaves other documents referred as a path or a URI in run fields as is (on the other hand, embedded documents are upgraded).
It covers the use case in which a given CWL refers external URI that is not owned by users.

How are the upgraded documents for each step returned to the caller?

There are several use cases but we do not have to support them directly with upgrade_document, IMO.

  • All step documents are located in the same directory as a given workflow document (dir/workflow.cwl, dir/step1.cwl, dir/step2.cwl...).
    • We can simply apply upgrade_document to all the documents in the directory.
  • Some documents are located in the server (e.g., https://raw.githubusercontent.com/.../step1.cwl: current implementation does not support this case)
    • Fetching external documents should be done in other function (will be provided in cwl-utils?).
  • A workflow documents and all step documents are located in different directories (dir/workflow.cwl, dir/steps/step1.cwl, dir/steps/step2.cwl...).
    • IMO it is not a good idea to support this case with upgrade_document because it contains dangerous cases. For example, consider the case of dir/workflow/workflow.cwl and dir/tools/step1.cwl. dir/workflow/workflow.cwl refers dir/tools/step1.cwl as run: ../tools/step1.cwl. If upgrade_document upgrades all the documents with keeping its directory structure, it accidentally outputs files outside the specified output_dir. Other handling (e.g., upgrade documents without keeping directory structure, warning if trying to output files outside a given output_dir) only covers some use cases.
    • It is better to make directories by hand to fit users' requirements.

@mr-c
Copy link
Member

mr-c commented Nov 12, 2021

Sounds like you are suggesting we specialize the function into 2 different functions (that share code):

  1. Single document only, returned as JSON-compatible objects (may result in mixed versions)
  2. One or more documents on disk, either updated in place or output to a new directory

and that both options also accept a URL to a remote document (necessitating an output directory in the case of 2)

I find this an agreeable plan. The interface is more useful and no functionality is lost.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants