diff --git a/README.md b/README.md index a5c89f4..ed76d87 100644 --- a/README.md +++ b/README.md @@ -2,11 +2,8 @@ ## Project updates -* ***Status as of May 27, 2020***: A rewrite and update, with some possible - simplifications is in the works. Some features, such as audit logging and - SLURM integration will probably be either dropped or moved out into separate - libraries, to make it easier to keep the core library easy to maintain in - lockstep with Luigi. +* ***Update May 27, 2020***: Version 0.9.7 is released, and should work well + with Python 3.8 and Luigi 2.8. Please [report any issues](https://github.com/pharmbio/sciluigi)! * ***A paper with the motivation and design decisions behind SciLuigi [now available](http://dx.doi.org/10.1186/s13321-016-0179-6)*** * If you use SciLuigi in your research, please cite it like this:
Lampa S, Alvarsson J, Spjuth O. Towards agile large-scale predictive modelling in drug discovery with flow-based programming design principles. *J Cheminform*. 2016. doi:[10.1186/s13321-016-0179-6](http://dx.doi.org/10.1186/s13321-016-0179-6). @@ -288,9 +285,11 @@ Contributors Acknowledgements ---------------- -This work is funded by: +This work has been supported by: - [Faculty grants of the dept. of Pharmaceutical Biosciences, Uppsala University](http://www.farmbio.uu.se) - [Bioinformatics Infrastructure for Life Sciences, BILS](https://bils.se) +- [Vinnova](https://www.vinnova.se/) via the project [KoDa - Kollektivtrafikens Datalab](https://www.vinnova.se/p/koda---kollektivtrafikens-datalab/) + as granted to [Savantic](https://savantic.eu/) and others. Many ideas and inspiration for the API is taken from: - [John Paul Morrison's invention and works on Flow-Based Programming](http://jpaulmorrison.com/fbp) diff --git a/README.rst b/README.rst index f04ea2f..6f7d755 100644 --- a/README.rst +++ b/README.rst @@ -4,65 +4,66 @@ Project updates --------------- -- ***Status as of May 27, 2020***: A rewrite and update, with some - possible simplifications is in the works. Some features, such as - audit logging and SLURM integration will probably be either dropped - or moved out into separate libraries, to make it easier to keep the - core library easy to maintain in lockstep with Luigi. -- ***A paper with the motivation and design decisions behind SciLuigi - `now available `__*** -- If you use SciLuigi in your research, please cite it like this: Lampa - S, Alvarsson J, Spjuth O. Towards agile large-scale predictive - modelling in drug discovery with flow-based programming design - principles. *J Cheminform*. 2016. - doi:\ `10.1186/s13321-016-0179-6 `__. -- ***A Virtual Machine with a realistic, runnable, example workflow in - a Jupyter Notebook, is available - `here `__*** -- ***Watch a 10 minute screencast going through the basics of using - SciLuigi `here `__*** -- ***See a poster describing the motivations behind SciLuigi - `here `__*** +- **Update May 27, 2020**: Version 0.9.7 is released, and should work + well with Python 3.8 and Luigi 2.8. Please `report any + issues `__! +- **A paper with the motivation and design decisions behind + SciLuigi\ **\ `now + available `__ + + - If you use SciLuigi in your research, please cite it like this: + Lampa S, Alvarsson J, Spjuth O. Towards agile large-scale + predictive modelling in drug discovery with flow-based programming + design principles. *J Cheminform*. 2016. + doi:\ `10.1186/s13321-016-0179-6 `__. + +- **A Virtual Machine with a realistic, runnable, example workflow in a + Jupyter Notebook, is + available\ **\ `here `__ +- **Watch a 10 minute screencast going through the basics of using + SciLuigi\ **\ `here `__ +- **See a poster describing the motivations behind + SciLuigi\ **\ `here `__ About SciLuigi -------------- Scientific Luigi (SciLuigi for short) is a light-weight wrapper library -around `Spotify `__'s +around `Spotify `__\ ’s `Luigi `__ workflow system that aims to make writing scientific workflows more fluent, flexible and modular. Luigi is a flexile and fun-to-use library. It has turned out though that its default way of defining dependencies by hard coding them in each -task's requires() function is not optimal for some type of workflows -common e.g. in bioinformatics where multiple inputs and outputs, complex +task’s requires() function is not optimal for some type of workflows +common e.g. in bioinformatics where multiple inputs and outputs, complex dependencies, and the need to quickly try different workflow connectivity in an explorative fashion is central to the way of working. SciLuigi was designed to solve some of these problems, by providing the -following "features" over vanilla Luigi: +following “features” over vanilla Luigi: - Separation of dependency definitions from the tasks themselves, for improved modularity and composability. -- Inputs and outputs implemented as separate fields, a.k.a. "ports", to +- Inputs and outputs implemented as separate fields, a.k.a. “ports”, to allow specifying dependencies between specific input and output-targets rather than just between tasks. This is again to let such details of the network definition reside outside the tasks. - The fact that inputs and outputs are object fields, also allows auto-completion support to ease the network connection work (Works - great e.g. with + great e.g. with `jedi-vim `__). -- Inputs and outputs are connected with an intuitive "single-assignment - syntax". -- "Good default" high-level logging of workflow tasks and execution +- Inputs and outputs are connected with an intuitive “single-assignment + syntax”. +- “Good default” high-level logging of workflow tasks and execution times. - Produces an easy to read audit-report with high level information per task. - Integration with some HPC workload managers. (So far only `SLURM `__ though). -Because of Luigi's easy-to-use API these changes have been implemented -as a very thin layer on top of luigi's own API with no changes at all to +Because of Luigi’s easy-to-use API these changes have been implemented +as a very thin layer on top of luigi’s own API with no changes at all to the luigi core, which means that you can continue leveraging the work already being put into maintaining and further developing luigi by the team at Spotify and others. @@ -70,8 +71,8 @@ team at Spotify and others. Workflow code quick demo ------------------------ -***For a brief 10 minute screencast going through the basics below, see -`this link `__*** +**For a brief 10 minute screencast going through the basics below, +see\ **\ `this link `__ Just to give a quick feel for how a workflow definition might look like in SciLuigi, check this code example (implementation of tasks hidden @@ -79,22 +80,22 @@ here for brevity. See Usage section further below for more details): .. code:: python - import sciluigi as sl + import sciluigi as sl - class MyWorkflow(sl.WorkflowTask): - def workflow(self): - # Initialize tasks: - foowrt = self.new_task('foowriter', MyFooWriter) - foorpl = self.new_task('fooreplacer', MyFooReplacer, - replacement='bar') + class MyWorkflow(sl.WorkflowTask): + def workflow(self): + # Initialize tasks: + foowrt = self.new_task('foowriter', MyFooWriter) + foorpl = self.new_task('fooreplacer', MyFooReplacer, + replacement='bar') - # Here we do the *magic*: Connecting outputs to inputs: - foorpl.in_foo = foowrt.out_foo + # Here we do the *magic*: Connecting outputs to inputs: + foorpl.in_foo = foowrt.out_foo - # Return the last task(s) in the workflow chain. - return foorpl + # Return the last task(s) in the workflow chain. + return foorpl -That's it! And again, see the "usage" section just below for a more +That’s it! And again, see the “usage” section just below for a more detailed description of getting to this! Support: Getting help @@ -120,21 +121,21 @@ Install .. code:: bash - pip install sciluigi + pip install sciluigi 2. Now you can use the library by just importing it in your python script, like so: .. code:: python - import sciluigi + import sciluigi Note that you can aliase it to a shorter name, for brevity, and to save keystrokes: .. code:: python - import sciluigi as sl + import sciluigi as sl Usage ----- @@ -144,7 +145,7 @@ vanilla Luigi. Very briefly, it is done in these main steps: 1. Create a workflow tasks class 2. Create task classes -3. Add the workflow definition in the workflow class's ``workflow()`` +3. Add the workflow definition in the workflow class’s ``workflow()`` method. 4. Add a run method at the end of the script 5. Run the script @@ -165,11 +166,11 @@ Example: .. code:: python - import sciluigi + import sciluigi - class MyWorkflow(sciluigi.WorkflowTask): - def workflow(self): - pass # TODO: Implement workflow here later! + class MyWorkflow(sciluigi.WorkflowTask): + def workflow(self): + pass # TODO: Implement workflow here later! Create tasks ~~~~~~~~~~~~ @@ -189,42 +190,44 @@ This is done by: 4. Define luigi parameters to the task. 5. Implement the ``run()`` method of the task. +.. _example-1: + Example: ^^^^^^^^ -Let's define a simple task that just writes "foo" to a file named +Let’s define a simple task that just writes “foo” to a file named ``foo.txt``: .. code:: python - class MyFooWriter(sciluigi.Task): - # We have no inputs here - # Define outputs: - def out_foo(self): - return sciluigi.TargetInfo(self, 'foo.txt') - def run(self): - with self.out_foo().open('w') as foofile: - foofile.write('foo\n') + class MyFooWriter(sciluigi.Task): + # We have no inputs here + # Define outputs: + def out_foo(self): + return sciluigi.TargetInfo(self, 'foo.txt') + def run(self): + with self.out_foo().open('w') as foofile: + foofile.write('foo\n') -Then, let's create a task that replaces "foo" with "bar": +Then, let’s create a task that replaces “foo” with “bar”: .. code:: python - class MyFooReplacer(sciluigi.Task): - replacement = sciluigi.Parameter() # Here, we take as a parameter - # what to replace foo with. - # Here we have one input, a "foo file": - in_foo = None - # ... and an output, a "bar file": - def out_replaced(self): - # As the path to the returned target(info), we - # use the path of the foo file: - return sciluigi.TargetInfo(self, self.in_foo().path + '.bar.txt') - def run(self): - with self.in_foo().open() as in_f: - with self.out_replaced().open('w') as out_f: - # Here we see that we use the parameter self.replacement: - out_f.write(in_f.read().replace('foo', self.replacement)) + class MyFooReplacer(sciluigi.Task): + replacement = sciluigi.Parameter() # Here, we take as a parameter + # what to replace foo with. + # Here we have one input, a "foo file": + in_foo = None + # ... and an output, a "bar file": + def out_replaced(self): + # As the path to the returned target(info), we + # use the path of the foo file: + return sciluigi.TargetInfo(self, self.in_foo().path + '.bar.txt') + def run(self): + with self.in_foo().open() as in_f: + with self.out_replaced().open('w') as out_f: + # Here we see that we use the parameter self.replacement: + out_f.write(in_f.read().replace('foo', self.replacement)) The last lines, we could have instead written using the command-line ``sed`` utility, available in linux, by calling it on the commandline, @@ -232,12 +235,12 @@ with the built-in ``ex()`` method: .. code:: python - def run(self): - # Here, we use the in-built self.ex() method, to execute commands: - self.ex("sed 's/foo/{repl}/g' {inpath} > {outpath}".format( - repl=self.replacement, - inpath=self.in_foo().path, - outpath=self.out_replaced().path)) + def run(self): + # Here, we use the in-built self.ex() method, to execute commands: + self.ex("sed 's/foo/{repl}/g' {inpath} > {outpath}".format( + repl=self.replacement, + inpath=self.in_foo().path, + outpath=self.out_replaced().path)) Write the workflow definition ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ @@ -254,23 +257,25 @@ We do this by: the right ``in_*`` field. 3. Returning the last task in the chain, from the workflow method. +.. _example-2: + Example: ^^^^^^^^ .. code:: python - import sciluigi - class MyWorkflow(sciluigi.WorkflowTask): - def workflow(self): - foowriter = self.new_task('foowriter', MyFooWriter) - fooreplacer = self.new_task('fooreplacer', MyFooReplacer, - replacement='bar') + import sciluigi + class MyWorkflow(sciluigi.WorkflowTask): + def workflow(self): + foowriter = self.new_task('foowriter', MyFooWriter) + fooreplacer = self.new_task('fooreplacer', MyFooReplacer, + replacement='bar') - # Here we do the *magic*: Connecting outputs to inputs: - fooreplacer.in_foo = foowriter.out_foo + # Here we do the *magic*: Connecting outputs to inputs: + fooreplacer.in_foo = foowriter.out_foo - # Return the last task(s) in the workflow chain. - return fooreplacer + # Return the last task(s) in the workflow chain. + return fooreplacer Add a run method to the end of the script ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ @@ -278,12 +283,12 @@ Add a run method to the end of the script Now, the only thing that remains, is adding a run method to the end of the script. -You can use luigi's own ``luigi.run()``, or our own two methods: +You can use luigi’s own ``luigi.run()``, or our own two methods: 1. ``sciluigi.run()`` 2. ``sciluigi.run_local()`` -The ``run_local()`` one, is handy if you don't want to run a central +The ``run_local()`` one, is handy if you don’t want to run a central scheduler daemon, but just want to run the workflow as a script. Both of the above take the same options as ``luigi.run()``, so you can @@ -291,9 +296,9 @@ for example set the main class to use (our workflow task): :: - # End of script .... - if __name__ == '__main__': - sciluigi.run_local(main_task_cls=MyWorkflow) + # End of script .... + if __name__ == '__main__': + sciluigi.run_local(main_task_cls=MyWorkflow) Run the workflow ~~~~~~~~~~~~~~~~ @@ -302,9 +307,9 @@ Now, you should be able to run the workflow as simple as: .. code:: bash - python myworkflow.py + python myworkflow.py -... provided of course, that the workflow is saved in a file named +… provided of course, that the workflow is saved in a file named myworkflow.py. More Examples @@ -337,7 +342,7 @@ Known Limitations either. Both of the limitations are due to the fact that Luigi does scheduling -and execution separately (with the exception of Luigi's `dynamic +and execution separately (with the exception of Luigi’s `dynamic dependencies `__, but they work only for upstream tasks, not downstream tasks, which we would need). @@ -350,9 +355,10 @@ Changelog --------- - 0.9.3b4 -- Support for Python 3 (Thanks to @jeffcjohnson for contributing - this!). -- Bug fixes. + + - Support for Python 3 (Thanks to @jeffcjohnson for contributing + this!). + - Bug fixes. Contributors ------------ @@ -363,13 +369,17 @@ Contributors Acknowledgements ---------------- -This work is funded by: - `Faculty grants of the dept. of Pharmaceutical -Biosciences, Uppsala University `__ - -`Bioinformatics Infrastructure for Life Sciences, -BILS `__ +This work has been supported by: - `Faculty grants of the dept. of +Pharmaceutical Biosciences, Uppsala +University `__ - `Bioinformatics +Infrastructure for Life Sciences, BILS `__ - +`Vinnova `__ via the project `KoDa - +Kollektivtrafikens +Datalab `__ +as granted to `Savantic `__ and others. Many ideas and inspiration for the API is taken from: - `John Paul -Morrison's invention and works on Flow-Based +Morrison’s invention and works on Flow-Based Programming `__ Publications using SciLuigi @@ -395,7 +405,7 @@ some of the limitations we have still faced with Python/Luigi/SciLuigi: `SciPipe `__. `SciPipe `__ leverages some of the successful parts -of Luigi's API, such as the flexible file name formatting, but replaces +of Luigi’s API, such as the flexible file name formatting, but replaces the Luigi scheduler with a custom, novel and very light-weight implicit dataflow scheduler written in Go. We find that it makes life much easier for complex workflow constructs as those involving cross validation, diff --git a/setup.py b/setup.py index a534b36..8f0ee31 100644 --- a/setup.py +++ b/setup.py @@ -18,7 +18,7 @@ setup( name='sciluigi', - version='0.9.6b7', + version='0.9.7', description='Helper library for writing dynamic, flexible workflows in luigi', long_description=long_description, author='Samuel Lampa', @@ -45,7 +45,7 @@ 'Programming Language :: Python :: 2', 'Programming Language :: Python :: 3', 'Programming Language :: Python :: 2.7', - 'Programming Language :: Python :: 3.4', + 'Programming Language :: Python :: 3.7', 'Topic :: Scientific/Engineering', 'Topic :: Scientific/Engineering :: Bio-Informatics', 'Topic :: Scientific/Engineering :: Chemistry',