Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Documentation for importing packed files #71

Open
alexiswl opened this issue Apr 15, 2021 · 10 comments
Open

Documentation for importing packed files #71

alexiswl opened this issue Apr 15, 2021 · 10 comments

Comments

@alexiswl
Copy link
Contributor

Hello,

Been playing around with how to import a packed cwl json file as a CWL parser object.

Here are my steps so far

Setup

# Imports
from pathlib import Path
import json
import sys

# Set path
cwl_file_path = Path("/path/to/cwl.packed.json")

# Load file as dict
# Read in the cwl file from a json
with open(cwl_file_path, "r") as cwl_h:
    cwl_file_dict = json.load(cwl_h)
    

# Conditional import based on cwl version
if 'cwlVersion' not in list(cwl_file_dict.keys()):
    print("Error - could not get the cwlVersion")
    sys.exit(1)
# Import parser based on CWL Version
if cwl_file_dict['cwlVersion'] == 'v1.0':
    from cwl_utils import parser_v1_0 as parser
elif cwl_file_dict['cwlVersion'] == 'v1.1':
    from cwl_utils import parser_v1_1 as parser
elif cwl_file_dict['cwlVersion'] == 'v1.2':
    from cwl_utils import parser_v1_2 as parser
else:
    print("Version error. Did not recognise {} as a CWL version".format(yaml_obj["CWLVersion"]))
    sys.exit(1)

First attempt:

Use the load document feature

parser.load_document(cwl_file_dict, cwl_file_path.absolute().as_uri()) 

SchemaSaladException: Cannot load $import without fileuri

Second attempt

Convert to string then load

parser.load_document_by_string(json.dumps(cwl_file_dict), cwl_file_path.absolute().as_uri())

ValidationException: - tried _RecordLoader but
  Expected a dict
- tried _RecordLoader but
  Expected a dict
...

Third attempt

Convert to yaml then load

# We need to import the ruamel yaml class
from ruamel import yaml
# Dump our dict to a yaml string
cwl_yaml_dump = yaml.round_trip_dump(cwl_file_dict, Dumper=yaml.RoundTripDumper)
# Load yaml
cwl_yaml_load = yaml.round_trip_load(cwl_yaml_dump, preserve_quotes=True)
# Import 
parser.load_document_by_yaml(cwl_yaml_load, cwl_file_path.absolute().as_uri())

ValidationException: - tried _RecordLoader but
  Expected a dict
- tried _RecordLoader but
  Expected a dict
...

Fourth attempt

Convert just graph to yaml then load

# We need to import the ruamel yaml class
from ruamel import yaml
# Dump our dict to a yaml string
cwl_yaml_dump = yaml.round_trip_dump(cwl_file_dict['$graph'], Dumper=yaml.RoundTripDumper)
# Load yaml
cwl_yaml_load = yaml.round_trip_load(cwl_yaml_dump, preserve_quotes=True)
# Import 
parser.load_document_by_yaml(cwl_yaml_load, cwl_file_path.absolute().as_uri())

ValidationException: - tried _RecordLoader but
  Expected a dict
- tried _RecordLoader but
  Expected a dict
...

Is this due to my workflow being a little bit too complicated for the parser and using record schemas?

@mr-c
Copy link
Member

mr-c commented Apr 15, 2021

Hey @alexiswl ; can you put an example packed workflow that exhibits this issue on https://gist.github.com/ or similar and drop the link here?

@alexiswl
Copy link
Contributor Author

@mr-c
Copy link
Member

mr-c commented Sep 3, 2021

@alexiswl FYI, that file has ids in its custom types, that is not formally part of the CWL standard: https://www.commonwl.org/v1.2/CommandLineTool.html#CommandInputRecordSchema

@alexiswl
Copy link
Contributor Author

alexiswl commented Sep 4, 2021

Hi @mr-c, do you know why this might be? The raw yaml is now publicly accessible at https://github.com/umccr/cwl-ica/blob/main/workflows/bcl-conversion/3.7.5/bcl-conversion__3.7.5.cwl

None of the schemas present have the id attribute in them either:

At the moment, in order to import these workflows that contain schemas through the CWL parser, I have to first import the schema object and then manually append the schema object to the namespace.

See:
https://github.com/umccr/cwl-ica/blob/main/src/classes/cwl.py#L135-L154

For packed cwl files this would be a little more difficult for I need to first find the SchemaDefRequirement inside the graph and add them to the $namespaces attribute of the graph.

I guess something like so would be a possible way to grab the schemas required for the workflow.

$ cwltool --pack bcl-conversion__3.7.5.cwl | \
jq --raw-output '.["$graph"][-1].requirements[] | select(.class=="SchemaDefRequirement") | .types[] | .["$import"]'

#settings-by-samples__1.0.0.yaml
#fastq-list-row__1.0.0.yaml

Where the jq component of this would be done in python.

Still, it nonetheless seems quite hacky that this is a requirement.

@mr-c
Copy link
Member

mr-c commented Sep 4, 2021

@alexiswl As you can see, your helpful example has launched many fixed to cwltool --pack, the code in schema_salad that produces the parsers, and the schema of the CWL standards themselves (!).

Ultimately (when all is done, merged, and released) the answer to your question will be "Load the packed document like any other." :-)

FYI, here is my variation on your testing script

"""
Import a cwl file as a parser object
"""

import sys
from pathlib import Path

from schema_salad.utils import yaml_no_ts 
# ^^ requires schema_salad >= 8.2
# does preserve_quotes=True and more

# Set path
cwl_file_path = Path(sys.argv[1])

# Load file as yaml dict
# Read in the cwl file from a json/yaml
with open(cwl_file_path, "r") as cwl_h:
    cwl_file_yaml = yaml_no_ts().load(cwl_h)

# Conditional import based on cwl version
if 'cwlVersion' not in cwl_file_yaml:
    print("Error - could not get the cwlVersion")
    sys.exit(1)
# Import parser based on CWL Version
if cwl_file_yaml['cwlVersion'] == 'v1.0':
    from cwl_utils import parser_v1_0 as parser
elif cwl_file_yaml['cwlVersion'] == 'v1.1':
    from cwl_utils import parser_v1_1 as parser
elif cwl_file_yaml['cwlVersion'] == 'v1.2':
    from cwl_utils import parser_v1_2 as parser
else:
    print("Version error. Did not recognise {} as a CWL version".format(yaml_obj["CWLVersion"]))
    sys.exit(1)

doc = parser.load_document_by_yaml(cwl_file_yaml, cwl_file_path.absolute().as_uri())

@alexiswl
Copy link
Contributor Author

alexiswl commented Sep 6, 2021

Thanks for this @mr-c! I appreciate the feedback and very happy to know that this has fixed multiple parts!

Do you recommend the yaml_no_ts from https://github.com/common-workflow-language/schema_salad/blob/main/schema_salad/utils.py#L133 over ruamel's 'round-trip-load' from https://sourceforge.net/p/ruamel-yaml/code/ci/default/tree/main.py#l1132 ?

Is the only difference the loading of timestamps?

@mr-c
Copy link
Member

mr-c commented Sep 6, 2021

Is the only difference the loading of timestamps?

Correct. Probably not needed in your case

@mr-c
Copy link
Member

mr-c commented Sep 15, 2021

@alexiswl Can you try packing with https://github.com/rabix/sbpack ?

@alexiswl
Copy link
Contributor Author

Thanks for the suggestion @mr-c, looks like this would handle most of the workarounds we're currently doing. Is there a 'local-only' functionality of this tool / a way to import a local packed file? We don't use the Seven Bridges endpoint.

@mr-c
Copy link
Member

mr-c commented Sep 16, 2021

Oh, I should have been more specific! It includes a local only tool named cwlpack

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants