Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Detect parameter collisions when using the @inherits and @requires decorators #2566

Closed
KenyaDonDraper opened this issue Oct 31, 2018 · 2 comments
Labels

Comments

@KenyaDonDraper
Copy link
Contributor

KenyaDonDraper commented Oct 31, 2018

At a project am working on, we are heavily utilizing the @requires decorator to ease parameter inheritance pain. This is a big project so we have split up our monolithic pipeline into smaller pipelines each being worked on by a different team. Some of these pipelines also derive from others using normal Python inheritance. Because of our use case (many interdependent pipelines and use of @requires) we have found ourselves in the situation where two or several tasks in the pipelines chain define the same parameter name in different contexts. Since the context is different the existing functionality of @requires (and @inherits) of silently masking a similarly named upstream parameter is undesirable. It would be better if Luigi could avoid this silent parameter masking or give a warning at the very least.

This problem that I have described, is what we are unofficially calling the "parameter collision" problem. There are two flavors of this problem that we have encountered;

Case A:

import luigi
from luigi.util import requires


###  Pipeline I
class TaskA(luigi.Task):
    param_a = luigi.Parameter()

    def output(self):
        return luigi.LocalTarget('/tmp/log-{t.param_a}'.format(t=self))

@requires(TaskA)
class TaskB(luigi.Task):
    param_b = luigi.Parameter()

    def output(self):
        return luigi.LocalTarget('/tmp/log-{t.param_b}'.format(t=self))


###  Pipeline II
@requires(TaskB)
class TaskC(luigi.Task):
    param_a = luigi.Parameter()

Case B:

import luigi
from luigi.util import requires

###  Pipeline I
class TaskX(luigi.Task):
    param_a = luigi.Parameter()

    def output(self):
        return luigi.LocalTarget('/tmp/log-{t.param_a}'.format(t=self))


###  Pipeline II
class TaskY(luigi.Task):
    param_a = luigi.Parameter()

    def output(self):
        return luigi.LocalTarget('/tmp/log-{t.param_a}'.format(t=self))


###  Pipeline III
@requires(TaskA, TaskB)
class TaskZ(luigi.Task):
    param_b= luigi.Parameter()

In the first case (Case A), TaskC will silently mask param_a that had been passed down from the upstream TaskA. In the 2nd case (Case B), TaskZ will possibly also mask param_a from either TaskX or TaskY. Such a masking could have been unintentional and therefore it's dangerous for parameters to be silently overwritten (masked).

In a big project, with different teams working on different parts, it might be impossible to realize you have masked/overwritten a parameter that had been defined in an upstream task. Therefore, we need a mechanism for detecting and warning the developer when such a collision of parameter names occurs. We evaluated a few options but the most promising solution seems to be;

  • Modify the @requires and @inherits decorator to throw an exception when a parameter collision is detected. In addition, add a new argument, ignore_collisions=(), to allow excluding select parameters from the collision detection. The intention is that when a parameter collision is detected at a particular point in the task inheritance chain, the developer has the choice of either renaming one of the parameters or include it in the ignore_collisions list if the collision is desired. I have included a WIP pull request here.

Has anyone else encountered this problem and if so do you think the suggested solution is sufficient? If the answer to both of these questions is in the affirmative, I will rework the pull request to include tests and documentation.

@stale
Copy link

stale bot commented Feb 28, 2019

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. If closed, you may revisit when your time allows and reopen! Thank you for your contributions.

@stale stale bot added the wontfix label Feb 28, 2019
@stale stale bot closed this as completed Mar 14, 2019
jeverling added a commit to kimetrica/luigi that referenced this issue May 30, 2022
jeverling added a commit to kimetrica/luigi that referenced this issue May 30, 2022
jeverling added a commit to kimetrica/luigi that referenced this issue May 30, 2022
@jeverling
Copy link
Contributor

jeverling commented May 30, 2022

I have added a PR (#3169) based on the work of @KenyaDonDraper that also adds a config switch so this functionality can be enabled without possibly affecting existing workflows.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants