Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

More informative exception text for failing pre/post conditions using pdp.cond.Condition objects #68

Open
shaypal5 opened this issue Nov 19, 2021 · 5 comments

Comments

@shaypal5
Copy link
Collaborator

shaypal5 commented Nov 19, 2021

If I'm using a pdp.cond.Condition object, for example pdp.cond.HasAllColumns(['a', 'b']), as a precondition (or a post-condition) in my pipeline stage and it fails, I expect an informative exception message, such as "Not all required columns ['a', 'b'] found in input dataframe" (or better yet, "Required columns ['a'] not found in input dataframe", in the case only "a" is missing).

This is something that can be achieved by using custom descriptions of specific Condition classes, as they provide semantic information on why they failed (or at least, they could).

At the moment, I get a generic "Precondition failed in stage...". :(

@yarkhinephyo
Copy link
Contributor

yarkhinephyo commented Nov 19, 2021

Hi Shy,

For displaying an error message specific to a condition, (Example - "Not all required columns ['a', 'b'] found in input dataframe" when only ['a'] is missing), I am thinking add a new attribute to Condition to store self.exdesc which is populated during object construction by subclasses such as HasNoColumn. Then inside PdPipelineStage, if self._prec_arg returns False during precondition check, self._prec_arg.exdesc is accessed as the string for display.

For displaying a more specific error message, (Example - "Required columns ['a'] not found in input dataframe" when only ['a'] is missing), I am thinking the def __call__ of condition inner classes such as class _NoColumnsFunc, can return ConditionResponse object which contains a boolean and a string instead of just a boolean. Then I guess there will have to be an abstract parent class for these inner classes to declare type signature of __call__.

Does it align with what you have in mind?

@shaypal5
Copy link
Collaborator Author

I'll say let's start with the more basic type of prints, because thinking about this again it seems like most stages will have to be re-written with a more complex and less efficient code to be able to tell this.

And the simpler implementation can just have a custom attribute that each condition must populate, with the Condition class providing a default string.

I would go with a name that shows this attribute is part of the api, so Condition.error_message.

Then, the different methods of PipelineStage using pre and post conditions can do (for example fit):

def fit(self, X, y=None, exraise=None, verbose=False):
    if exraise is None:
        exraise = self._exraise
    if self._compound_prec(X):
        if verbose:
            msg = '- ' + '\n  '.join(textwrap.wrap(self._appmsg))
            print(msg, flush=True)
        res_df = self._fit_transform(X, verbose=verbose)
        if exraise and not self._compound_post(df=res_df):
            try:
                raise FailedPostconditionError(self._compound_post.error_message)
            except AttributeError:
                raise FailedPostconditionError(self._exmsg_post)
        return X
    if exraise:
        try:
            raise FailedPreconditionError(self._compound_prec.error_message)
        except AttributeError:
            raise FailedPreconditionError(self._exmsg)
    return X

@shaypal5
Copy link
Collaborator Author

We have to do this because a condition can be any callable object that can return a bool value for pandas.DataFrame objects, and might not have any of these attributes.

@yarkhinephyo
Copy link
Contributor

Thanks for the detailed explanation!

Just to check again, Condition class should look something like this:

class Condition(object):
    def __init__(self, func, error_message=None, fittable=None):
        self._func = func
        self._fittable = fittable
        if error_message is not None:
            self.error_message = error_message
...

Also would it be more helpful for user if we give them the "Precondition failed in stage..." either way?
Something along this line -

if exraise:
    try:
        raise FailedPreconditionError(self._exmsg + self._compound_prec.error_message)
    except AttributeError:
        raise FailedPreconditionError(self._exmsg)

@shaypal5
Copy link
Collaborator Author

Yes, just add the error_message param after fittable, so argument order is backward compatible.

Regarding 2: Yeah, good idea. Also, a fix to my mistake: It should be self._prec_arg.error_message. _compound_prec is a method of PdPipelineStage, not the Condition object that might have been supplied.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants