Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

documentation: subworkflows #13

Open
matthdsm opened this issue Mar 6, 2020 · 9 comments
Open

documentation: subworkflows #13

matthdsm opened this issue Mar 6, 2020 · 9 comments
Labels
documentation Requires documentation

Comments

@matthdsm
Copy link

matthdsm commented Mar 6, 2020

Hi,

Would it be possible to add a quick comment on how to use subworkflows?
Do I just add them as in a "master" workflow?

Thanks
M

@illusional
Copy link
Member

illusional commented Mar 6, 2020

You can add them to the registry in the exact same way as a command tool (by declaring it and importing it in the provider’s init. There’s an example here: https://github.com/PMCC-BioinformaticsCore/janis-bioinformatics/blob/master/janis_bioinformatics/tools/common/bwaaligner.py

You can use them interchangeably with a command tool.

Here’s an example where subworkflows are used, and also another sub workflow is declared in the method “ self.process_subpipeline”:

https://github.com/PMCC-BioinformaticsCore/janis-pipelines/blob/6ff3929e56aafe5616cb9fc2310b6d8198c97690/janis_pipelines/wgs_somatic/wgssomatic.py#L65

Same if you use a workflow builder:

subwf = WorkflowBuilder(...)
# build subwf here

wf = WorkflowBuilder(...)
wf.step(“subWfStepId”, subwf(**inputMap))
wf.output(‘outFromSubWf”, source=wf. subWfStepId.nameOfOutput)

I’ll leave this open as I still need to document it.

@matthdsm
Copy link
Author

matthdsm commented Mar 6, 2020

Great! Thanks for the quick reply!

Cheers
M

@illusional
Copy link
Member

No worries! Keep feeling free to raise issues on here, very happy to answer them!

It’s actually amazing that planes have WIFI.

@matthdsm
Copy link
Author

matthdsm commented Mar 6, 2020

Talk about "over the air" updates 😉

@matthdsm
Copy link
Author

matthdsm commented Mar 6, 2020

Unrelated question:
say I have an array of FastqGz from and I need to create a sample map thats consumable as a list of files (fofn in gatk terms)

Practically, I need something along the lines of

bcl2fastq -> Array(FastqGz)
    -> "Unknown method"
        -> FastqGzPair + sampleName: String()
            -> Gatk4FastqToSamLatest.fastqR1, Gatk4FastqToSamLatest.fastqR2

I'm thinking about creating a python tool that parses the list of FastqGz to an object formatted as

{
    samplename: {
        "R1": samplename_R1.fastq.gz,
        "R2": samplename_R2.fastq.gz
    },
    ...
}

but I'm unsure on how to correctly implement this as something that'll make sense in janis.

Any idea's? Advice?

Thanks already.
Cheers
M

@illusional
Copy link
Member

Yes you could build a PythonTool that returned an object:

{
    “sampleName”: YourSampleName,
    "R1": samplename_R1.fastq.gz,
    "R2": samplename_R2.fastq.gz
}

Which could map to the outputs:

  • sampleName: String()
  • R1: FastqGz
  • R2: FastqGz

Ultimately, it would be useful in Janis to refer to the first index of an output (eg: w.bclStep.fastqs[0]), but we’re a little bit off that in #8

@matthdsm
Copy link
Author

matthdsm commented Mar 6, 2020

Great, thanks!
So I suppose something like this should work?

class GenerateSampleMap(janis.PythonTool):
    def id(self):
        return "GenerateSampleMap"

    def version(self):
        return "v0.0.1"

    @staticmethod
    def code_block(files_list: List[str]):
        samplemap = {}
        for filename in files_list:
            samplename = filename.split("_S")[0]
            if not samplename in samplemap:
                samplemap[samplename] = {}
            if "R1" in filename:
                samplemap[samplename]["R1"] = filename
            elif "R2" in filename:
                samplemap[samplename]["R2"] = filename

        return [{"samplename": k, **v} for k, v in samplemap.items()]

    def outputs(self) -> List[List[TOutput]]:
        return [
            TOutput("samplename", String()),
            TOutput("R1", FastqGz()),
            TOutput("R2", FastqGz()),
        ]

@illusional
Copy link
Member

Ah I see I see. We don’t support these custom structures. I’d recommend making each return type an array:


    def outputs(self) -> List[List[TOutput]]:
        return [
            TOutput("samplename", Array(String())),
            TOutput("R1", Array(FastqGz())),
            TOutput("R2", Array(FastqGz())),
        ]

(And changing your python code to suit)

Then when you use the result from this, you can dot scatter on all three fields: https://github.com/PMCC-BioinformaticsCore/janis-workshops/blob/master/workshop2/6-scatter.md

@matthdsm
Copy link
Author

matthdsm commented Mar 6, 2020

Awesome, thanks for the help
Code is now

class GenerateSampleMap(janis.PythonTool):
    def id(self):
        return "GenerateSampleMap"

    def version(self):
        return "v0.0.1"

    @staticmethod
    def code_block(files_list: List[str]):
        samplemap = {}
        for filename in files_list:
            samplename = filename.split("_S")[0]
            if not samplename in samplemap:
                samplemap[samplename] = {}
            if "R1" in filename:
                samplemap[samplename]["R1"] = filename
            elif "R2" in filename:
                samplemap[samplename]["R2"] = filename

        return [[v[key] for key in sorted(v.keys())] for k, v in samplemap.items()]

    def outputs(self) -> List[List[TOutput]]:
        return [
            TOutput("R1", FastqGz()),
            TOutput("R2", FastqGz()),
        ]

which outputs roughly as

[['D1710903_S64_R1_001.fastq.gz', 'D1710903_S64_R2_001.fastq.gz'], ['D1820847_S46_R1_001.fastq.gz', 'D1820847_S46_R2_001.fastq.gz'], ['D1900814_S78_R1_001.fastq.gz', 'D1900814_S78_R2_001.fastq.gz'], ['D1904578_S33_R1_001.fastq.gz', 'D1904578_S33_R2_001.fastq.gz'], ['D1905752_S79_R1_001.fastq.gz', 'D1905752_S79_R2_001.fastq.gz'], ['D1908147_S47_R1_001.fastq.gz', 'D1908147_S47_R2_001.fastq.gz'], ['D1821957_S71_R1_001.fastq.gz', 'D1821957_S71_R2_001.fastq.gz'], ['D1905632_S84_R1_001.fastq.gz', 'D1905632_S84_R2_001.fastq.gz'], ['D1908155_S48_R1_001.fastq.gz', 'D1908155_S48_R2_001.fastq.gz'], ['D1812139_S1_R1_001.fastq.gz', 'D1812139_S1_R2_001.fastq.gz'], ['D1901986_S98_R1_001.fastq.gz', 'D1901986_S98_R2_001.fastq.gz'], ['D1907884_S45_R1_001.fastq.gz', 'D1907884_S45_R2_001.fastq.gz'], ['D1822234_S77_R1_001.fastq.gz', 'D1822234_S77_R2_001.fastq.gz'], ['D1905676_S2_R1_001.fastq.gz', 'D1905676_S2_R2_001.fastq.gz'], ['D1908600_S3_R1_001.fastq.gz', 'D1908600_S3_R2_001.fastq.gz']]

and is ideal for a dotproduct as you said!

Thanks!
M

@illusional illusional added the documentation Requires documentation label Mar 23, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
documentation Requires documentation
Projects
None yet
Development

No branches or pull requests

2 participants