You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
For pipeline stages provided by the pdpipe.basic_stages, supplying conditions to the prec and post keyword arguments may not return the correct error messages.
Error message indicates that the user-provided precondition check failed. Or more specifically, column x was not found in the df.
Actual Result
Error message indicates that the precondition check failed because column a was not found in the df.
---------------------------------------------------------------------------
FailedPreconditionError Traceback (most recent call last)
c:\users\yarkh\desktop\cs\projects\pdpipe\pdpipe\core.py in fit_transform(self, X, y, exraise, verbose, time)
962 stage.application_context = self.application_context
--> 963 inter_x = stage.fit_transform(
964 X=inter_x,
c:\users\yarkh\desktop\cs\projects\pdpipe\pdpipe\core.py in fit_transform(self, X, y, exraise, verbose)
411 if exraise:
--> 412 raise FailedPreconditionError(self._exmsg)
413 return X
FailedPreconditionError: FreqDrop stage failed because column a was not found in input dataframe.
The above exception was the direct cause of the following exception:
PipelineApplicationError Traceback (most recent call last)
~\AppData\Local\Temp/ipykernel_17544/3358191693.py in <module>
4 pdp.FreqDrop(2, 'a', prec=pdp.cond.HasAllColumns(['x']))
5 ])
----> 6 pline.apply(df)
c:\users\yarkh\desktop\cs\projects\pdpipe\pdpipe\core.py in apply(self, df, exraise, verbose, time)
883 return res
884 self.fit_context = PdpApplicationContext()
--> 885 res = self.fit_transform(
886 X=df,
887 exraise=exraise,
c:\users\yarkh\desktop\cs\projects\pdpipe\pdpipe\core.py in fit_transform(self, X, y, exraise, verbose, time)
968 )
969 except Exception as e:
--> 970 raise PipelineApplicationError(
971 f"Exception raised in stage [ {i}] {stage}"
972 ) from e
PipelineApplicationError: Exception raised in stage [ 0] PdPipelineStage: Drop values with frequency < 2 in column a.
Reason
In PdPipelineStage, _compound_prec checks for both user's precondition and the subclass's precondition but only one type of error message is provided for both.
def _compound_prec(self, df):
if self._prec_arg:
return self._prec_arg(df)
return self._prec(df)
Possible Fix
Check the user-provided precondition _prec_arg and the subclass-provided precondition _prec separately and display different FailedPreconditionError for each of them.
The text was updated successfully, but these errors were encountered:
Ok. Now seeing this and trying this for myself I think pre- and post-conditions error messages should be their business, not that of the pipeline stage.
Recall I wrote the pipeline stage class first, way before I wrote conditions, and then I expected these messages to come up when the precondition failed.
I think PdPipelineStage attribute of exmsg should now become a message meant to be used only when pipeline application itself fails, and not the precondition. It should probably be very generic, unless the author of a specific stage can be very sure that a pipeline can fail for only one reason (if the precondition passed).
Also, let's work more closely on the writing this time, so write to me here when you have concrete suggestions. Note: This change shouldn't break the API whatsoever; just re-word current exmsg default values and the documentation regarding it.
You're welcome to tackle this yourself, if you want. I want to focus on current issues with pickling stages and pipelines. :)
For pipeline stages provided by the
pdpipe.basic_stages
, supplying conditions to theprec
andpost
keyword arguments may not return the correct error messages.Example Code
The problem is there for
post
argument too.Expected Result
Error message indicates that the user-provided precondition check failed. Or more specifically, column
x
was not found in thedf
.Actual Result
Error message indicates that the precondition check failed because column
a
was not found in thedf
.Reason
In
PdPipelineStage
,_compound_prec
checks for both user's precondition and the subclass's precondition but only one type of error message is provided for both.Possible Fix
Check the user-provided precondition
_prec_arg
and the subclass-provided precondition_prec
separately and display differentFailedPreconditionError
for each of them.The text was updated successfully, but these errors were encountered: