Issue when initializing explainer through TabularExplainer and KernelExplainer #463

lucazav · 2021-11-18T17:51:46Z

I have a trained regression model (a VotingEnsemble model obtained through training with Azure AutoML) and I'd like to generate an explainer using TabularExplainer.
My dataset has a column ('CALENDAR_DATE') of type datetime64[ns], which my model handles correctly (predict method works fine).
After the import of the TabularExplainer class, I tried to initialize my explainer through:

features = X_train.columns

explainer = TabularExplainer(model,
                                  X_train,
                                  features=features,
                                  model_task = 'regression')

but I get the following error:

RuntimeError: cuML is required to use GPU explainers.
Check https://rapids.ai/start.html for more
information on how to install it.
The above exception was the direct cause of the following exception:
[...]
ValueError: Could not find valid explainer to explain model

I get the same error message when I force:

explainer = TabularExplainer(model,
                                  X_train,
                                  features=features,
                                  model_task = 'regression',
                                  use_gpu=False)

Thus, I proceeded trying to explicitly inizialize a KernelExplainer, thorugh:

explainer = KernelExplainer(model,
                                 X_train,
                                 features=features,
model_task = 'regression')

but I received the error:

float() argument must be a string or a number, not 'Timestamp'

Therefore I changed the 'CALENDAR_DATE' column type to string, with:

X_train_copy = X_train.copy()
X_train_copy['CALENDAR_DATE'] = X_train_copy['CALENDAR_DATE'].astype(str)

After this, both TabularExplainer and KernelExplainer correctly work when initializing the explainers (with the modified dataset X_train_copy).

Why does this happen?

The text was updated successfully, but these errors were encountered:

imatiach-msft · 2021-11-19T22:12:17Z

@lucazav I believe the issue with the bad cuML error message appeared in interpret-community 0.18.0 and was fixed in 0.21.0:

#450
See description:

Based on experience of debugging with customer, when TabularExplainer fails with default use_gpu=False on GPUKernelExplainer it prints the last warning, even though it will always fail. This PR separates it out so it only runs when use_gpu flag is on. The previous logic would skip every explainer if use_gpu=True other than GPUKernelExplainer, but still for some reason run it even if use_gpu=False. By separating it out, the customer will once again see the most useful error message from the last default catch-all KernelExplainer.'

With latest version you will see the error you saw:
float() argument must be a string or a number, not 'Timestamp'

This seems to be due to the timestamp column. However, it seems like the explainers should be able to support this datatype, based on:

https://github.com/interpretml/interpret-community/blob/master/python/interpret_community/dataset/dataset_wrapper.py#L25

It should automatically featurize the timestamp column and explain numeric fields:

                tmp_dataset[time_col_name + '_year'] = tmp_dataset[time_col_name].map(lambda x: x.year)
                tmp_dataset[time_col_name + '_month'] = tmp_dataset[time_col_name].map(lambda x: x.month)
                tmp_dataset[time_col_name + '_day'] = tmp_dataset[time_col_name].map(lambda x: x.day)
                tmp_dataset[time_col_name + '_hour'] = tmp_dataset[time_col_name].map(lambda x: x.hour)
                tmp_dataset[time_col_name + '_minute'] = tmp_dataset[time_col_name].map(lambda x: x.minute)
                tmp_dataset[time_col_name + '_second'] = tmp_dataset[time_col_name].map(lambda x: x.second)

I think I see the problem. This only exists on mimic explainer, based on this search:
https://github.com/interpretml/interpret-community/search?q=apply_timestamp_featurizer

So basically all other explainers (except for MimicExplainer) can't handle timestamp type column. You can convert the column to numeric (eg the float value in seconds), but for some explainers like LIME explainer it won't work well - specifically for LIME you won't be able to sample around the value correctly to get meaningful results. For KernelExplainer it might work more sensibly, since it's just replacing the value with the background data and not trying to change it, and the feature importance might be correct in the sense of how important the column is but difficult to intepret in the sense that increasing/decreasing the value will result in a specific change to the output (which you can't assume anyway with shap values, but which will be especially difficult to assume here), since there may be many cyclical/seasonal complex relationships for the time feature. I think it's more useful to break the time feature into the components like above and view feature importances in terms of day/hour/month/etc to get a better understanding of how it may influence the model's output.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Issue when initializing explainer through TabularExplainer and KernelExplainer #463

Issue when initializing explainer through TabularExplainer and KernelExplainer #463

lucazav commented Nov 18, 2021

imatiach-msft commented Nov 19, 2021

Issue when initializing explainer through TabularExplainer and KernelExplainer #463

Issue when initializing explainer through TabularExplainer and KernelExplainer #463

Comments

lucazav commented Nov 18, 2021

imatiach-msft commented Nov 19, 2021