Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

schema_salad.exceptions.ValidationExceptions parsing certain data structures #31

Open
leipzig opened this issue Sep 4, 2020 · 12 comments
Assignees

Comments

@leipzig
Copy link

leipzig commented Sep 4, 2020

import cwl_utils.parser_v1_2 as parser
parser.load_document("https://raw.githubusercontent.com/common-workflow-library/bio-cwl-tools/release/picard/picard_CreateSequenceDictionary.cwl")
  InitialWorkDirRequirement:
    listing:
      - $(inputs.REFERENCE)
outputs:
  ...
  sequences_with_dictionary:
    type: File
    format: edam:format_2573  # SAM
    secondaryFiles:
      - .dict
      - .fai
parser.load_document("https://raw.githubusercontent.com/common-workflow-library/bio-cwl-tools/release/bamtools/bamtools_stats.cwl")
bamtools_stats.cwl:110:5:  the `outputBinding` field is not valid...
the `outputEval` field is not valid...
@leipzig
Copy link
Author

leipzig commented Sep 6, 2020

secondaryFiles within outputs is not terribly common but listing should accept that expression
https://github.com/common-workflow-language/schema_salad/blob/main/schema_salad/tests/test_schema/CommandLineTool.yml#L776-L781

@stain
Copy link
Member

stain commented Sep 8, 2020

Probably not related, but you are testing with a v1.2 parser even though document declares cwlVersion: 1.2

@tetron
Copy link
Member

tetron commented Sep 8, 2020

@leipzig so what's actually happening here is that codegen was never updated for the schema salad v1.1, which added a few features that are used for CWL 1.1 and 1.2. Specifically:

  1. secondaryFilesDSL
  2. Special treatment of the Expression type
  3. subscopes (used for assigning identifiers)
  4. default values

The first two are causing the problems you're reporting. Some suggestions:

  1. You need to add a case for secondaryFilesDSL, see https://github.com/common-workflow-language/schema_salad/blob/main/schema_salad/codegen.py#L113 and https://www.commonwl.org/v1.1/SchemaSalad.html#Domain_Specific_Language_for_secondary_files
  2. You need to add a special case for the Expression type https://github.com/common-workflow-language/schema_salad/blob/af9842edc3e55723bfd0c50562ab44c7daa35226/schema_salad/python_codegen.py#L285 and add an _ExpressionLoader which should pretty much just check if the value is a string
  3. declare_field https://github.com/common-workflow-language/schema_salad/blob/af9842edc3e55723bfd0c50562ab44c7daa35226/schema_salad/python_codegen.py#L368 needs an extra subscope parameter, if the subscope is non-empty, then it gets appended to baseuri when loading the field
  4. begin_class should be extended to so that fields with default or optional are sorted to the end of the __init__ parameters and assigned None, and then in the body of __init__ any fields with default that are None should be assigned the default value.

@tetron
Copy link
Member

tetron commented Sep 8, 2020

What you should do is make a PR against schema-salad that fixes these issues, and a PR against cwl-utils with updated v1.1 and v1.2 parsers and test cases of previously-failing v1.1 and v1.2 files.

@leipzig
Copy link
Author

leipzig commented Sep 12, 2020

@tetron I am a bit confused about number 3. Does it apply only to declare_id_field or declare_field as well. It seems the subscope argument is something that appears as when jsonldPredicate is a list, so it would be more relevant to declare_field

      jsonldPredicate:
        _id: "cwl:run"
        _type: "@id"
        subscope: run

but that when it's jsonldPredicate: "@id" we are talking about workflow id

@tetron
Copy link
Member

tetron commented Sep 14, 2020

@leipzig When the predicate is @id that means that field is the identifier for the object it appears in.

A field with _type: "@id" means it is a reference to an identifier.

A subscope means that jsonldPredicate: "@id" fields get an extra path element to avoid identifier name conflicts with the parent. This is done by adding the subscope to the baseuri that is used to construct identifiers in the nested objects.

Does that help?

@leipzig
Copy link
Author

leipzig commented Sep 22, 2020

Should I just concatenate baseuri+subscope?

@tetron
Copy link
Member

tetron commented Sep 22, 2020

Pretty much. Here's what ref_resolver.py does

subscope = ""  # type: str
if key in loader.subscopes:
    subscope = "/" + loader.subscopes[key]
document[key], _ = loader.resolve_all(
    val, base_url + subscope, file_base=file_base, checklinks=False
)

@tetron
Copy link
Member

tetron commented Sep 22, 2020

(the important part is that it also adds a slash)

@mr-c
Copy link
Member

mr-c commented Oct 23, 2020

@leipzig Are you still working on this? Is there a branch of what you've tried already?

@leipzig
Copy link
Author

leipzig commented Oct 23, 2020

@mr-c i have some progress on this branch https://github.com/leipzig/schema_salad/tree/main
can you point me to a 1.2 example on hand with all of these features?

@mr-c
Copy link
Member

mr-c commented Oct 23, 2020

@leipzig Huzzah!

Here are some CWL v1.2 examples. courtesy @jdidion

https://github.com/common-workflow-language/cwl-v1.2/blob/main/tests/record-sd-secondaryFiles.cwl
https://github.com/common-workflow-language/cwl-v1.2/blob/main/tests/anon_enum_inside_array_inside_schemadef.cwl
https://github.com/common-workflow-language/cwl-v1.2/blob/main/tests/stage_file_array_basename_and_entryname.cwl
https://github.com/common-workflow-language/cwl-v1.2/blob/main/tests/docker-array-secondaryfiles.cwl
https://github.com/common-workflow-language/cwl-v1.2/blob/main/tests/optional-output.cwl
https://github.com/common-workflow-language/cwl-v1.2/blob/main/tests/record-in-secondaryFiles.cwl
https://github.com/common-workflow-language/cwl-v1.2/blob/main/tests/initialworkdirrequirement-docker-out.cwl
https://github.com/common-workflow-language/cwl-v1.2/blob/main/tests/secondaryfiles/rename-outputs.cwl
https://github.com/common-workflow-language/cwl-v1.2/blob/main/tests/stage_file_array.cwl
https://github.com/common-workflow-language/cwl-v1.2/blob/main/tests/stage_file_array_basename.cwl
https://github.com/common-workflow-language/cwl-v1.2/blob/main/tests/stage_array.cwl
https://github.com/common-workflow-language/cwl-v1.2/blob/main/tests/initialworkdir-glob-fullpath.cwl
https://github.com/common-workflow-language/cwl-v1.2/blob/main/tests/record-out-secondaryFiles.cwl
https://github.com/common-workflow-language/cwl-v1.2/blob/main/tests/secondaryfiles/rename-inputs.yml
https://github.com/common-workflow-language/cwl-v1.2/blob/main/tests/params.cwl
https://github.com/common-workflow-language/cwl-v1.2/blob/main/tests/params2.cwl
https://github.com/common-workflow-language/cwl-v1.2/blob/main/tests/schemadef-tool.cwl
https://github.com/common-workflow-language/cwl-v1.2/blob/main/tests/imported-hint.cwl

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants