New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Update data checks to return DataCheckResults object #1444
Conversation
…alml into 1325_data_checks_returns_dict
Codecov Report
@@ Coverage Diff @@
## main #1444 +/- ##
=========================================
+ Coverage 100.0% 100.0% +0.1%
=========================================
Files 220 220
Lines 14672 14687 +15
=========================================
+ Hits 14665 14680 +15
Misses 7 7
Continue to review full report at Codecov.
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM - very nice
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@angela97lin Looks good! I left a comment about how defining a class could make things easier/cleaner in the long run.
""" | ||
messages = { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the downside of using DataCheckMessageType
as the keys is that the user needs to import the enum in order to look at the warnings and results. I think that can be annoying, especially if the user is running automl and trying to access data_checks_results
:
- Creating a
DataCheckResults
class. - Using string values like "warning" and "error" as keys in the dict you have here.
The benefits of 1 is thatL
- We can expose the warnings and errors as instance properties and the user doesn't have to import another class.
- We can also have an api for checking if there are warnings or errors, or it's empty as opposed to
if self._data_check_results[DataCheckMessageType.ERROR]
. The nice thing about this is that if we ever add data to the dict, theis_empty
wouldn't break users code downstream but the dict would because they'd be checkingif results == {DataCheckMessageType.WARNING: [], DataCheckMessageType.ERROR: []}
The downside is that it's another class but our docs already show how to get data from the data checks results.
The benefit of 2 is that it'd be a small change but would definitely lead to typos lol.
My vote is for 1 I think it'd be fine to keep it as-is for now too. I saw that you left to use DataCheckMessageType as key?
in the PR description so wanted to offer my thoughts lol.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ooo I like this suggestion a lot. I agree, the reason why I left the comment about using DataCheckMessageType
as key was because as I was updating the code, I too felt the inconvenience / frustration of having to import the enum everywhere, but as @dsherry had mentioned, since we already have the enums in place we might as well use them.
That being said, I like the idea of creating a separate DataCheckResults
class, and having warnings
and errors
as attributes 🤔 That way, the user doesn't need to directly type in the keys as strings, and any typos will result in an AttributeError
instead.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@freddyaboulton @angela97lin sure I'm on board with having a DataCheckResults
class! This would make it easy to access the errors and warnings. We could also define a to_json
method which returns native python types instead of DataCheckError
/DataCheckWarning
instances.
Is your proposal that we do that instead of merging this PR?
UPDATE: we discussed this, further changes tracked by #1430.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@angela97lin LGTM, 🚢 ! I suggest you merge this and then start #1430 because #1430 builds on this work.
""" | ||
messages = { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@freddyaboulton @angela97lin sure I'm on board with having a DataCheckResults
class! This would make it easy to access the errors and warnings. We could also define a to_json
method which returns native python types instead of DataCheckError
/DataCheckWarning
instances.
Is your proposal that we do that instead of merging this PR?
UPDATE: we discussed this, further changes tracked by #1430.
Closing in favor of #1448 |
Closes #1325
validate
always returns a DataCheckResults object now; no messages means returning{DataCheckMessageType.WARNING: [], DataCheckMessageType.ERROR: []}
DataCheckMessageType
as key?