You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
This is able to make predictions without issue, but when I tried running: dabl.explain(simple_clf)
I got the following warning:
TypeErrorTraceback (most recent call last)
/usr/local/lib/python3.7/site-packages/dabl/explain.py in _extract_inner_estimator(estimator, feature_names)
235 feature_names = inner_estimator.steps[0][1].get_feature_names(
--> 236 feature_names)
237 except TypeError:
TypeError: get_feature_names() takes 1 positional argument but 2 were given
During handling of the above exception, another exception occurred:
ValueErrorTraceback (most recent call last)
<ipython-input-36-37d2322eefb7> in <module>
----> 1 dabl.explain(simple_clf)
/usr/local/lib/python3.7/site-packages/dabl/explain.py in explain(estimator, X_val, y_val, target_col, feature_names, n_top_features)
123
124 inner_estimator, inner_feature_names = _extract_inner_estimator(
--> 125 estimator, feature_names)
126
127 if X_val is not None:
/usr/local/lib/python3.7/site-packages/dabl/explain.py in _extract_inner_estimator(estimator, feature_names)
236 feature_names)
237 except TypeError:
--> 238 feature_names = inner_estimator.steps[0][1].get_feature_names()
239
240 # now we have input feature names for the final step
/usr/local/lib/python3.7/site-packages/dabl/preprocessing.py in get_feature_names(self)
607 # FIXME that is really strange?!
608 ohe_cols = self.columns_[self.columns_.map(cols)]
--> 609 feature_names.extend(ohe.get_feature_names(ohe_cols))
610 elif name == "remainder":
611 assert trans == "drop"
/usr/local/lib/python3.7/site-packages/sklearn/preprocessing/_encoders.py in get_feature_names(self, input_features)
530 "input_features should have length equal to number of "
531 "features ({}), got {}".format(len(self.categories_),
--> 532 len(input_features)))
533
534 feature_names = []
ValueError: input_features should have length equal to number of features (16), got 14
My train_df_clean has the following structure:
continuous
dirty_float
low_card_int
categorical
date
free_string
useless
id
False
False
False
False
False
True
False
wt_ratio
True
False
False
False
False
False
False
cat_a
False
False
False
False
False
True
False
cat_b
False
False
False
False
False
True
False
cat_c
False
False
False
False
False
True
False
cat_d
False
False
False
False
False
True
False
cat_e
False
False
False
True
False
False
False
cat_f
False
False
False
True
False
False
False
cat_g
False
False
False
True
False
False
False
alt_id
False
False
False
False
False
True
False
type_1
False
False
False
False
False
True
False
type_2
False
False
False
False
False
True
False
class
False
False
False
True
False
False
False
subclass_1
False
False
False
True
False
False
False
subclass_2
False
False
False
False
False
True
False
in_cap
False
False
False
False
False
True
False
tax_1
False
False
False
True
False
False
False
tax_2
False
False
False
True
False
False
False
col
False
False
False
False
False
True
False
cap_loc
False
False
False
True
False
False
False
'grp_1`
False
False
False
True
False
False
False
grp_2
False
False
False
False
False
True
False
grp_3
False
False
False
False
False
True
False
temp
False
False
False
False
False
True
False
notes_1
False
False
False
False
False
True
False
notes_2
False
False
False
False
False
True
False
meas_1
True
False
False
False
False
False
False
meas_2
True
False
False
False
False
False
False
meas_3
True
False
False
False
False
False
False
meas_4
False
False
False
True
False
False
False
len
True
False
False
False
False
False
False
loc_area1
False
False
False
True
False
False
False
loc_area2
False
False
False
True
False
False
False
loc_area3
False
False
False
True
False
False
False
loc_area4
False
False
False
True
False
False
False
And I can see that I have 14 categories which seems to align with the count in the error, but I would have assumed that the GaussianNB would also use the continuous fields.
Thank you in advance!
The text was updated successfully, but these errors were encountered:
I have done more manual digging and determined that this issue appears to lie with a categorical value being turned into a continuous RV. When pulling other features out, a Decision Tree was the best model and the dabl.explain function worked without issue. However, once the NB model won out, it broke the function. Additionally, there was interesting behavior when nan values existed in the columns.
Thanks for reporting! Indeed the explain function is not very robust yet.
Scikit-learn makes mapping input to output columns a bit hard, which will hopefully be improved by scikit-learn/scikit-learn#16772
I'll see what I can do in the meantime, dabl also needs some updates for the current version on sklearn, which I'll probably try to make work first.
Awesome, thank you! I was planning on looking through the code to try and understand more as well. But thank you for your work on this! I think it is a quite cool library.
I've tried dabl with my own data but I replicated the "quickstart guide".
In particular I have
which had the following results:
This is able to make predictions without issue, but when I tried running:
dabl.explain(simple_clf)
I got the following warning:
My
train_df_clean
has the following structure:id
wt_ratio
cat_a
cat_b
cat_c
cat_d
cat_e
cat_f
cat_g
alt_id
type_1
type_2
class
subclass_1
subclass_2
in_cap
tax_1
tax_2
col
cap_loc
grp_2
grp_3
temp
notes_1
notes_2
meas_1
meas_2
meas_3
meas_4
len
loc_area1
loc_area2
loc_area3
loc_area4
And I can see that I have 14 categories which seems to align with the count in the error, but I would have assumed that the GaussianNB would also use the continuous fields.
Thank you in advance!
The text was updated successfully, but these errors were encountered: