New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
GH-15947: fixed skipped_column error in Python #16164
base: rel-3.46.0
Are you sure you want to change the base?
Conversation
…specified when calling h2o.H2OFrame.
…ta frames to h2O data frames.
…specified or not.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thanks for this fix @wendycwong, but I must admit that I still don't understand why the old test is still in place.
@@ -126,11 +126,9 @@ def H2OFrame_from_H2OFrame(): | |||
|
|||
|
|||
def H2OFrame_skipped_columns_is_BUGGY(): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you can rename the test now that it's fixed :)
raise ValueError( | ||
"length of col_names should be equal to the number of columns parsed: %d vs %d" | ||
% (len(column_names), parse_column_len)) | ||
"length of col_names minus lenght of skipped_columns should equal the number of columns parsed: " |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
typo: length
@@ -871,10 +871,10 @@ def parse_setup(raw_frames, destination_frame=None, header=0, separator=None, co | |||
if column_names is not None: | |||
if not isinstance(column_names, list): raise ValueError("col_names should be a list") | |||
if (skipped_columns is not None) and len(skipped_columns)>0: | |||
if (len(column_names)) != parse_column_len: | |||
if ((len(column_names)-len(skipped_columns)) != parse_column_len) and (len(column_names) != parse_column_len): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you explain why with the current behaviour, parse_column_len
must equal one or the other? why do we not always enforce len(column_names)-len(skipped_columns) == parse_column_len
for example?
Especially given that this test is applied only if len(skipped_columns)>0
and given that above we define:
parse_column_len = len(j["column_types"]) if skipped_columns is None else (len(j["column_types"])-len(skipped_columns))
therefore I don't understand the case where we would have len(column_names) == parse_column_len
, as checked in the original test.
Should we not only test :
if (len(column_names)-len(skipped_columns)) != parse_column_len:
and if not, doesn't it show a more profound inconsistency/bug?
issue: #15947
The problem here is when called with h2o.H2OFrame, we did not take into account of skipped columns when trying to figure out the final column counts.
Fixed the bug and added Python test from Seb.
Fixed the bug in R and added R test.