Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Parameters vs Inputs #53

Open
multimeric opened this issue May 7, 2018 · 5 comments
Open

Parameters vs Inputs #53

multimeric opened this issue May 7, 2018 · 5 comments

Comments

@multimeric
Copy link

In this example, I notice that you use both in_ parameters (in_foo = None), and normal luigi parameters (replacement = sciluigi.Parameter()). What is the actual difference here? When do I define an input as an in_ vs making it a sciluigi.Parameter?

@samuell
Copy link
Member

samuell commented May 7, 2018

Hi @TMiguelT ,

"Normal" parameters are for data that can be passed as simple values (strings, numerical integer or float values, booleans etc), while the in_ type of inputs, are for things that need to be saved to a file before passing on between tasks.

Did that answer your questions?

@multimeric
Copy link
Author

Thanks that helps a bit. Can you connect parameters to out_ functions of other tasks in sciluigi, or just in_ fields?

@samuell
Copy link
Member

samuell commented May 7, 2018

Can you connect parameters to out_ functions of other tasks in sciluigi, or just in_ fields?

Only in_ -fields

@multimeric
Copy link
Author

But what if I want to specify a non-file parameter using the output of the previous job? I can't?

@samuell
Copy link
Member

samuell commented May 7, 2018

But what if I want to specify a non-file parameter using the output of the previous job? I can't?

Ah, yea, this is one thing that is not so easy with Luigi/Sciluigi, unless you can write that output to a file somehow, and read from this file in your downstream parts of the workflow.

What we did when we needed this before, was to put the part of the workflow being fed with calculated parameter values in a separate workflow, and call this workflow as a separate python file. An example where we do this is here (The use case workflow to the sciluigi paper). So, that whole MainWorkflowRunner task is just a wrapper around a python command executing a separate, parametrized, workflow instance.

Footnote: This whole problem is related to the fact that Luigi does scheduling and execution in two separate phases, and that parameter values need to be set during the scheduling phase. This means they can't be obtained during execution, since then the scheduling is already done. This is one reason why we have lately gravitated toward full dataflow-based workflows, where scheduling and execution is done simultaneously, and to this end are developing the scipipe engine instead.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants