Explain what sort of workflows CWL is for. #36

mr-c · 2020-10-26T14:28:19Z

To address #35

drkennetz · 2020-10-26T15:33:40Z

README.md

+cloud, and high performance computing (HPC) environments.
+
+CWL is for dataflow style batch analysis, where the units of processing are command line programs.
+


Do we think it would be beneficial to comment here on some known workflows / use cases that CWL does NOT handle well?

Example from chat:

"explicitly not for business process modeling"

And any other use cases that users can think of that aren't intended use cases?

I would add something clarifying the use for batch processing vs interactive processing, as sometimes we've had confusion about workflows having being able to interact with external services such as as databases or other APIs

Do we think it would be beneficial to comment here on some known workflows / use cases that CWL does NOT handle well?

It could (and I see the value in that!), but it probably leaves the reader with a better feeling to not have a list of negatives when they first learn about something. I also don't want this introduction to be too long or wordy. A bit tricky to balance!

Yeah, I can see where you are coming from there. It might also dissuade someone from trying it out if they do not fully understand what is meant by the item listed as "not supported" (IE cwl could be a use case for their problem, but since they don't understand the terminology of the item they might just not try it out). After thinking about it, it might do more harm than good.

rupertnash · 2020-10-28T10:45:22Z

Apologies for banging my little drum (but @mr-c @ -ed the Gitter channel): while CWL is excellent for handling high-throughput computing, it is not (yet) equipped to handle high-performance computing tasks.

drkennetz · 2020-10-28T17:00:58Z

Apologies for banging my little drum (but @mr-c @ -ed the Gitter channel): while CWL is excellent for handling high-throughput computing, it is not (yet) equipped to handle high-performance computing tasks.

What is your scheduler @rupertnash? I have been using cwl workflows for HPC for over a year now using toil as the runner for IBM LSF.

rupertnash · 2020-10-29T10:42:02Z

My scheduler? We have both PBS pro and SLURM machines at EPCC

rupertnash · 2020-10-29T10:45:30Z

But my point is that having thousands of independent tasks running across a cluster is high throughput. Unless the processes are communicating with each other, it's not HPC.

geoffjentry · 2020-10-29T12:36:18Z

@rupertnash I suspect this is a case where different folks have different definitions of a term. It is common in the life sciences for "HPC" to colloquially imply on-prem job schedulers (e.g. LSF, SGE, SLURM, PBS, etc).

mr-c · 2020-10-29T15:18:58Z

@rupertnash I was going to suggest

The Common Workflow Language (CWL) is an open standard for describing analysis
workflows and tools in a way that makes them portable and scalable across a
variety of software and hardware environments, from workstations to cluster,
cloud, and HTC/HPC* environments.

And have the * link to an explanation of the plans to incorporate the MPIRequirement in a future version of the CWL standards. But then I noticed that https://en.wikipedia.org/wiki/High-performance_computing redirects to https://en.wikipedia.org/wiki/Supercomputer which is not at all helpful.

So I guess we'll take the HPC part out, and leave in High-Throughput Computing until MPIRequirement is matured and ratified.

drkennetz · 2020-10-29T15:42:38Z

@rupertnash I guess I am confused a bit, if the individual tools are are using processes that are communicating with each other, then is that not HPC? We have a workflow that calls a tool that uses 32 cores across 2 nodes, which the scheduler handles. This is then being requested by both the tool and the scheduler, if written correctly. So the tool is requires high performance computing, while the workflow just details the steps.

Are you referring to CWL itself using multiple processes talking to each other to execute a step?

tetron · 2020-10-30T18:57:37Z

@drkennetz the distinction @rupertnash is making is that HPC (in certain communities) implies a single logical job that runs as set of parallel processes across different nodes that need to coordinate to complete the job. For example a simulation where each node represents a particular "cell" of the simulated space, and nodes have to be able to interact at the boundaries. That's different from high throughput computing where you can split a job into pieces, scatter them over nodes, each piece runs independently of the others, and you gather the results at the end.

swzCuroverse · 2022-03-18T15:04:32Z

For the general user, the distinction for HPC and HTC is not well known. Additionally, many users will not know what HTC is. Perhaps instead of using HTC, we could have a phrase that encompasses what CWL does? We want to cognizant that every CWL is not an expert in these areas.

Explain what sort of workflows CWL is for.

f21d02e

drkennetz suggested changes Oct 26, 2020

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Explain what sort of workflows CWL is for. #36

Explain what sort of workflows CWL is for. #36

mr-c commented Oct 26, 2020

drkennetz Oct 26, 2020

ionox0 Oct 26, 2020

mr-c Oct 26, 2020

drkennetz Oct 26, 2020

rupertnash commented Oct 28, 2020

drkennetz commented Oct 28, 2020

rupertnash commented Oct 29, 2020

rupertnash commented Oct 29, 2020

geoffjentry commented Oct 29, 2020

mr-c commented Oct 29, 2020

drkennetz commented Oct 29, 2020

tetron commented Oct 30, 2020

swzCuroverse commented Mar 18, 2022

		cloud, and high performance computing (HPC) environments.

		CWL is for dataflow style batch analysis, where the units of processing are command line programs.

Explain what sort of workflows CWL is for. #36

Are you sure you want to change the base?

Explain what sort of workflows CWL is for. #36

Conversation

mr-c commented Oct 26, 2020

drkennetz Oct 26, 2020

Choose a reason for hiding this comment

ionox0 Oct 26, 2020

Choose a reason for hiding this comment

mr-c Oct 26, 2020

Choose a reason for hiding this comment

drkennetz Oct 26, 2020

Choose a reason for hiding this comment

rupertnash commented Oct 28, 2020

drkennetz commented Oct 28, 2020

rupertnash commented Oct 29, 2020

rupertnash commented Oct 29, 2020

geoffjentry commented Oct 29, 2020

mr-c commented Oct 29, 2020

drkennetz commented Oct 29, 2020

tetron commented Oct 30, 2020

swzCuroverse commented Mar 18, 2022