Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Multiple output formats #901

Open
fmigneault opened this issue May 21, 2020 · 4 comments
Open

Multiple output formats #901

fmigneault opened this issue May 21, 2020 · 4 comments

Comments

@fmigneault
Copy link

Hi, I am aware of #482 where outputs have been limited explicitly to only a single format.
I have more of a question regarding that decision.

I do not understand why it is critical that an output cannot have multiple potential formats.
Say, you have an app that takes as input --format = YAML|JSON, which produces the corresponding output.json or output.yml. Then, the format validation should be able to handle either of the corresponding IANA references.

Definitely, the actual output can have only 1 format within the potential ones, and if none is matched the process is invalid, but this should be a runtime validation/restriction, and not a limitation of the CWL definition. For matching, the output format, it could be done using simple extension matches or more advanced parser/schema validator. This is what we do on WPS (example docs) that I'm trying to map with CWL functionalities.

Thanks in advance for insights.

@mr-c
Copy link
Member

mr-c commented May 21, 2020

Hello @fmigneault

That's great to hear that you are mapping WPS to CWL!

As you know, the format field in the outputs section of a CWL CommandLineTool is either a single static string declaration or a dynamic CWL expression that produces a single format identifier: https://www.commonwl.org/v1.0/CommandLineTool.html#CommandOutputParameter

With the later, a CWL tool description author can dynamically set the output format based upon input parameters, some output of the program itself, or by passing through the format of one of the input files. The latter, which I call "JSON in -> JSON out", looks something like format: $(inputs.main_input.format).

As you have noted, the dynamic format expression without a list of potential formats (while convenient) does weaken the ability of a workflow engine to do type verification of a CWL workflow and all its components. Though from a type verification perspective, it would still be better to know the exact format without executing the CWL CommandLineTool description.


Ideally, the authors of CWL CommandLineTool descriptions would not be in this place to start with. Here are two techniques they can use to avoid this situation:

  1. For some (sub)commands, the difference in output format implies a different overall function. In that case, it is best for everyone to split the CWL tool descriptions so that there is just one CWL tool description for each function of the (sub)command.

  2. If the output formats are semantically identical, but different encodings, then one could be opinionated and choose one of the formats over the other in the CWL tool description.


For the situation where one wants to pass through an input format (the "JSON in -> JSON out" example), that can be reasoned with prior to execution of the entire workflow (in the best case) or just before execution of the CommandLineTool (why run a step if you know the next step won't accept the input). For this situation I don't feel that an extension to the CWL standards is needed.

Your request for a way to list potential formats could be a nice enhancement, perhaps an optional field potential_formats and the contents are an array of strings (no CWL expressions)? If you hurry, this might squeeze in for v1.2 if you send a PR to https://github.com/common-workflow-language/cwl-v1.2/ and an implementation to https://github.com/common-workflow-language/cwltool

As a workaround, you can add format verification in the format CWL expression by using InlineJavascriptRequirement and throwing an exception if the computed format was not the expected one.

You have an additional request to automatically assign the format field based upon some characteristic of the program's output. This is possible today using the InlineJavascriptRequirement and putting the logic in the format field as mentioned before.


While researching this response, I discovered that cwltool does no format checking during the validation phase, only at run time; this was a surprise to me! common-workflow-language/cwltool#1290

@fmigneault
Copy link
Author

@mr-c
Thanks for the explanation.
The input expression should be sufficient in most use cases, and InlineJavascriptRequirement could be used for more advanced validation. So probably there wouldn't be much use of potential_format other than documentation.

There is one case that I would like to validate. If I got an input array of Files with format defined as an array of possible formats, and $(inputs.main_input.format) is used for the generated output format also being an array, will it get evaluated per-file element so that processing would result as [JSON, XML, YAML] -> [JSON, XML, YAML] and all be considered valid?

@mr-c
Copy link
Member

mr-c commented May 21, 2020

$(inputs.main_input.format) is used for the generated output format also being an array, will it get evaluated per-file element so that processing would result as [JSON, XML, YAML] -> [JSON, XML, YAML] and all be considered valid?

Good question!

I just discovered that we don't strongly specify if type: File[] inputs with format: [iana:text/plain, iana:text/rtf] means that all members must have the same format or not. Nor do we have a way to specify that they must explicitly be the same format, or that explicitly they are allowed to be different formats. Though one could work around this with the same InlineJavascriptExpression exception throwing trick I mentioned.

#!/usr/bin/env cwl-runner
cwlVersion: v1.0
class: CommandLineTool
requirements:
  InlineJavascriptRequirement: {}
inputs:
  main_inputs:
    type: File[]
    format: [iana:text/plain, iana:text/xml]
    inputBinding:
      position: 0
    default:
      - class: File
        basename: a
        format: iana:text/plain
        contents: |
         A
      - class: File
        basename: b
        format: iana:text/xml
        contents: |
         B
baseCommand: cat
outputs: []
$namespaces: { iana: https://www.iana.org/assignments/media-types/ }

The above does not produce an error with cwltool 3.0.20200324120055

As for format of outputs: the spec defines the field as using the singular definitive article "the":

This is the file format that will be assigned to the output parameter.

So it seems that in CWL v1.0.x and v1.1, that Files in type: File[] output must either all have the same format or no format. This is the behavior of cwltool as well, as I just confirmed.

Also, if inputs.main_input is defined as type: File[] then it is a list of class: File object, so if you have a an output defined as type: File[] and you put $(inputs.main_input.format) in the format field then that expression will fail.

If you are okay with assuming all members of a type: File[] input have the same format, and you want to pass that through, then for the output format field you can use format: $(inputs.main_input[0].format), no InlineJavascriptRequirement necessary.

@fmigneault
Copy link
Author

So it seems that in CWL v1.0.x and v1.1, that Files in type: File[] output must either all have the same format or no format. This is the behavior of cwltool as well, as I just confirmed.

Yes, this seems to be the same conclusion I came to. On my side, when I obtain a multi-format and/or array-type output on WPS, I have to drop the corresponding CWL format because there is no way to allow an array of different formats, and I cannot assume one of the potential input multi-format provided.

Thanks for the validation 👍

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants