Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Incorrect precondition and postcondition error messages #70

Open
yarkhinephyo opened this issue Nov 28, 2021 · 1 comment
Open

Incorrect precondition and postcondition error messages #70

yarkhinephyo opened this issue Nov 28, 2021 · 1 comment

Comments

@yarkhinephyo
Copy link
Contributor

For pipeline stages provided by the pdpipe.basic_stages, supplying conditions to the prec and post keyword arguments may not return the correct error messages.

Example Code

import pandas as pd; import pdpipe as pdp;
df = pd.DataFrame([[1,4],[4,5],[1,11]], [1,2,3], ['a','b'])
pline = pdp.PdPipeline([
  pdp.FreqDrop(2, 'a', prec=pdp.cond.HasAllColumns(['x']))
])
pline.apply(df)

The problem is there for post argument too.

Expected Result

Error message indicates that the user-provided precondition check failed. Or more specifically, column x was not found in the df.

Actual Result

Error message indicates that the precondition check failed because column a was not found in the df.

---------------------------------------------------------------------------
FailedPreconditionError                   Traceback (most recent call last)
c:\users\yarkh\desktop\cs\projects\pdpipe\pdpipe\core.py in fit_transform(self, X, y, exraise, verbose, time)
    962                 stage.application_context = self.application_context
--> 963                 inter_x = stage.fit_transform(
    964                     X=inter_x,

c:\users\yarkh\desktop\cs\projects\pdpipe\pdpipe\core.py in fit_transform(self, X, y, exraise, verbose)
    411         if exraise:
--> 412             raise FailedPreconditionError(self._exmsg)
    413         return X

FailedPreconditionError: FreqDrop stage failed because column a was not found in input dataframe.

The above exception was the direct cause of the following exception:

PipelineApplicationError                  Traceback (most recent call last)
~\AppData\Local\Temp/ipykernel_17544/3358191693.py in <module>
      4   pdp.FreqDrop(2, 'a', prec=pdp.cond.HasAllColumns(['x']))
      5 ])
----> 6 pline.apply(df)

c:\users\yarkh\desktop\cs\projects\pdpipe\pdpipe\core.py in apply(self, df, exraise, verbose, time)
    883             return res
    884         self.fit_context = PdpApplicationContext()
--> 885         res = self.fit_transform(
    886             X=df,
    887             exraise=exraise,

c:\users\yarkh\desktop\cs\projects\pdpipe\pdpipe\core.py in fit_transform(self, X, y, exraise, verbose, time)
    968                 )
    969             except Exception as e:
--> 970                 raise PipelineApplicationError(
    971                     f"Exception raised in stage [ {i}] {stage}"
    972                 ) from e

PipelineApplicationError: Exception raised in stage [ 0] PdPipelineStage: Drop values with frequency < 2 in column a.

Reason

In PdPipelineStage, _compound_prec checks for both user's precondition and the subclass's precondition but only one type of error message is provided for both.

def _compound_prec(self, df):
  if self._prec_arg:
    return self._prec_arg(df)
  return self._prec(df)

Possible Fix

Check the user-provided precondition _prec_arg and the subclass-provided precondition _prec separately and display different FailedPreconditionError for each of them.

@shaypal5
Copy link
Collaborator

Ok. Now seeing this and trying this for myself I think pre- and post-conditions error messages should be their business, not that of the pipeline stage.

Recall I wrote the pipeline stage class first, way before I wrote conditions, and then I expected these messages to come up when the precondition failed.

I think PdPipelineStage attribute of exmsg should now become a message meant to be used only when pipeline application itself fails, and not the precondition. It should probably be very generic, unless the author of a specific stage can be very sure that a pipeline can fail for only one reason (if the precondition passed).

Also, let's work more closely on the writing this time, so write to me here when you have concrete suggestions. Note: This change shouldn't break the API whatsoever; just re-word current exmsg default values and the documentation regarding it.

You're welcome to tackle this yourself, if you want. I want to focus on current issues with pickling stages and pipelines. :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants