Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merging syntax with new operator #174

Open
bfrgoncalves opened this issue Nov 30, 2018 · 7 comments
Open

Merging syntax with new operator #174

bfrgoncalves opened this issue Nov 30, 2018 · 7 comments
Labels
discussion enhancement New feature or request

Comments

@bfrgoncalves
Copy link
Collaborator

At the moment, Flowcraft does not provide syntax in the pipeline string to define merge of outputs from multiple components into a single component . To allow that, a new operator should be added and be classified as the merge operator.

I propose the following sintax:

( (A | B ) > C ) | D ) > E,

where the outputs of A & B would be given as input for C and the outputs of C & D would be passed as input for E.

These modifications also require setting up the total number of accepted inputs on the merging components, instead of only accepting one main input.

@bfrgoncalves bfrgoncalves added enhancement New feature or request discussion labels Nov 30, 2018
@ODiogoSilva ODiogoSilva mentioned this issue Nov 30, 2018
@sjackman
Copy link
Contributor

sjackman commented Nov 30, 2018

Does the existence of the > merge operator imply that the same expression without > is also meaningful? ((A | B) C) | D) E. I believe in another issue a user asked for this expression to be equivalent to ACE | BCE | DE. Is that also the implied intention of this proposal?

@tiagofilipe12
Copy link
Collaborator

tiagofilipe12 commented Nov 30, 2018

@sjackman the idea would be to have other checks that forbide that string without the > operator. We already have some checks for malformed strings, so it is 'just' a matter of adding more and/or editing the existing sanity tests.

The other issue with repeating processes rather than writing duplicated processes in different forks will be handled apart from this, since it is simpler to implement. Although of course the design options are linked.

@sjackman
Copy link
Contributor

How about this syntax:

((A + B) C) + D) E

That indicates to me that A and B are the inputs to C. It would also allow for this:

(short | long | short + long) unicycler (bandage | quast)

in a single command to run a short-read assembly, long-read assembly, and hybrid assembly.

@ODiogoSilva
Copy link
Collaborator

(short | long | short + long) unicycler (bandage | quast)

@sjackman I'm not completely sure how this example would work but I'm trying to wrap my head around it because we need to consider the changes required for parsing the string and how to transform that into something readable by the engine.

You proposal seems cleaner. But there is a problem with the (A | A + B ) C syntax:

  • Component C will receive data from both A and A + B. However, component C (like all components in flowcraft) are agnostic about the preceding and following components. Which means that it will have no way to differentiate between the sample_ids that come from only A and the ones that come from A + B.
  • In your example you expect three different assemblies, but in this case only one will be published by the C component.

Btw, that example could also be replicated with the syntax suggested by @bfrgoncalves:

assembly="unicycler (bandage | quast)"
(short $assembly | long $assembly | (short | long) > $assembly )

It's more verbose, but also seems more explicit in separating the different assemblies (or whatever components we use after the merge).

I'm not discarding your proposal, just discussing the pros and cons of these approaches.

@sjackman
Copy link
Contributor

In my proposal (A | B) C is equivalent to A B | A C, so (short | long | short + long) unicycler is equivalent to…

short unicycler
long unicycler
(short + long) unicycler

Does that answer your question?

@ODiogoSilva
Copy link
Collaborator

Hmm, in that case the parser would repeat the C component for each lane, under the hood, is that it?

@sjackman
Copy link
Contributor

Yes, that's right.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
discussion enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

4 participants