A new interface for pytask #361
Replies: 1 comment 2 replies
-
Sounds basically well to me. Python InputsI often encountered the problem of tasks not running after changing some config values, ended up in adding a Get Rid of
|
Beta Was this translation helpful? Give feedback.
-
Introduction
pytask will become three years old soon 🥳. What better way to celebrate this birthday than with an open discussion about pytask's interface and maybe a revision of significant components?
We should agree on some principles to guide the design decisions in this discussion.
Each section will discuss one detail, and sections might build up on each other.
The problem with
depends_on
andproduces
depends_on
andproduces
are magic keywords the users need to know about. They must learn to use decorators that inject the correct values at runtime.depends_on
andproduces
obfuscate what's behind the values. It can be a path, a dictionary of paths, or a list of paths. Just look at these two signatures; what example is clearer?Getting rid of the decorators
We can eliminate the decorators by parsing
depends_on
andproduces
from the default arguments of a task function. This is already possible if a task is marked with@pytask.mark.task
.Getting rid of
depends_on
As you saw in the example before and here,
depends_on
is not necessary nor clearer. The values can be better attached as default arguments.This interface is not new. The
task
decorator already allows exploiting default arguments and some extra features (notask_
prefix necessary, passingkwargs
).But, pytask currently only looks for path dependencies in
depends_on
. The implementation could be changed without probably causing many problems to treat every task input exceptproduces
as a pytree and when apathlib.Path
is encountered, it is parsed as a dependency.str
should probably not be supported outside ofdepends_on
. Usingstr
independs_on
should be deprecated as well for keeping everything simpler.Since
pathlib.Path
s are always assumed to bePathNode
s, how can I pass apathlib.Path
as a normal function argument? Just wrap the path in aPythonNode
, the neutral element for pytask. APythonNode
is explained in the next section.Possible changes
The following are non-breaking changes.
depends_on
is present, do not change behavior. Also, strings are still allowed instead of paths.depends_on
is not present, try to parse from all arguments of the task function that are notproduces
and only if the value in the pytree is apathlib.Path
. Strings as paths are not allowed anymore.Allowing for more dependency and product types.
Currently, pytask builds the DAG only from paths, and it cannot handle different objects, significantly limiting users.
To add a new type like a
PythonNode
, the user creates a new class that inherits fromMetaNode
.The metaclass requires a new type to implement
state
property. The state is a hash or some almost unique value signaling changes in the node's value. It isNone
if the node does not exist.value
property to retrieve the value from the node and pass it inside the task.The Python node could then be
Hashing Python inputs
What is a usecase for this new
PythonNode
?Currently, python inputs are not processed as dependencies. If they alter the signature of the task, changing them might trigger a rerun of the task, but it is not a given.
Here, we declare that the task receives
additional_kwargs
. Internally, pytask will treat them in aPythonNode
container. We say the dictionary should be hashed to detect changes.(We do not want hashing by default since some inputs can be substantial and hashing too costly.)
Clearer types
The previous example introduced
PythonNode
s and how they can be used to declare and hash Python dependencies.A problem with the previous example is that types of function inputs are obfuscated by pytask's node types and do not reveal the original types anymore. Especially, for testing and other purposes it would be advantageous if you can call the function without any knowledge about pytask.
Luckily, a new Python feature recently dropped,
typing.Annotated
, was introduced in PEP 593. It allows adding arbitrary metadata to type hints.The previous example can be rewritten to
How to get rid of (path) products?
The solution to replace products is less straightforward than for dependencies. Currently, there are two ways to declare products. First, continue using
@pytask.mark.produces
or using onlyproduces
as a magical argument of the task function which already removes the decorator.If we want to declare products independent from the
produces
argument name, we could use our nodes with a new argument.Beta Was this translation helpful? Give feedback.
All reactions