Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Plans to better integrate the 'in' and 'out' requirements? #31

Open
BioComSoftware opened this issue Feb 22, 2017 · 4 comments
Open

Plans to better integrate the 'in' and 'out' requirements? #31

BioComSoftware opened this issue Feb 22, 2017 · 4 comments

Comments

@BioComSoftware
Copy link

Again, I'm not sure if 'issues' is the correct place to ask this. If not, I apologize.

I'm asking this question because; if there are no plans to implement the following, I may tackle it myself - unless you feel there's a specific reason not to.

So, there are two questions:

Q1.

In the tutorial workflow...

def workflow(self):
    foowriter = self.new_task('foowriter', MyFooWriter, out = 'foo.txt')
    fooreplacer = self.new_task('fooreplacer, MyFooReplacer, replacement='bar')
    fooreplacer.in_foo = foowriter.out_foo
    return fooreplacer        

...I notice that we:

  1. Create the 'fooreplacer' task object
  2. Associate fooreplacer.in_foo = foowriter.out_foo
  3. Then return fooreplacer to (basically) trigger all the tasks.

QUESTION: Are there any plans to make this sharing of in and out targets more integrated. For example...

def workflow(self):
    foowriter = self.new_task('foowriter', MyFooWriter, out = 'foo.txt')
    fooreplacer = self.new_task('fooreplacer', MyFooReplacer, replacement='bar', in_foo = foowriter.out_foo)
    return fooreplacer        

I realize this requires significant overhauling of the luigi.parameter class. Which is why I wanted to ask if you were working on it before I dug in :D

Q2:

I also notice that it is the line...

    fooreplacer.in_foo = foowriter.out_foo

...that actually triggers the foowriter.run(). If that line is removed (and foowriter.out_foo replaced manually inside of fooreplacer)...foowriter.run() never occurs.

Are there any plans to have the sciluigi.WorkflowTask class (in this case MyWorkflow) assume to run every new_task line...similar to the 'requires' method from luigi core, but not hardcoded in the class.
I.e. Something like an inline 'requires' statement (see below)...

def workflow(self):
    foowriter = self.new_task('foowriter', MyFooWriter, out = 'foo.txt')
    fooreplacer = self.new_task('fooreplacer', MyFooReplacer, replacement='bar', **requires = foowriter**)
    return fooreplacer        

...OR...

Just to simply assume they come in order.

def workflow(self):
    foowriter = self.new_task('foowriter', MyFooWriter, out = 'foo.txt')
    fooreplacer = self.new_task('fooreplacer', MyFooReplacer, replacement='bar', in_foo = foowriter.out_foo)
    return fooreplacer        

...which results in a backwards running of each new_task, in the reverse order it is listed within the workflow.

@samuell
Copy link
Member

samuell commented Feb 22, 2017

Hi @BioComSoftware and thanks again for reaching out (and yes, issues are great, so that others can make use of the information too)!

Q1

QUESTION: Are there any plans to make this sharing of in and out targets more integrated. For example...

def workflow(self):
foowriter = self.new_task('foowriter', MyFooWriter, out = 'foo.txt')
fooreplacer = self.new_task('fooreplacer', MyFooReplacer, replacement='bar', in_foo = >foowriter.out_foo)
return fooreplacer

No, we don't plan to do this. The reason is that the idea of using fields and functions for the in and out-ports is that we can get auto-completion on the available in-ports and out-ports, in python editors supporting that.

This is really important in our usecases where we want to develop a library of re-usable components. Then, using auto-completion to see the available ports, lessens the need to constantly look up the documentation while using component libraries.

Do you have some specific reasons why you find that would be better? :)

Q2

I'm not sure I follow completely. But to start with, the following line:

fooreplacer.in_foo = foowriter.out_foo

... does not execute the out_foo() method there, but only assigns this method to the field in_foo of fooreplacer, and the method will thus need to be executed later.

This method is actually executed as part of the requires() method of SciLuigi tasks already, as can see by this code in the DependencyHelpers mixin used by sciluigi.Task.

Please correct me if I misunderstood you!

Best Regards // Samuel

@BioComSoftware
Copy link
Author

BioComSoftware commented Feb 22, 2017 via email

@BioComSoftware
Copy link
Author

BioComSoftware commented Feb 22, 2017 via email

@samuell
Copy link
Member

samuell commented Feb 22, 2017

Thanks for the additional info @BioComSoftware! Will have to have a closer look at this when head is clear and fresh :) perhaps tomorrow or so.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants