Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Perform global explainability on the inference dataset #54

Open
NickNtamp opened this issue Dec 7, 2022 · 2 comments
Open

Perform global explainability on the inference dataset #54

NickNtamp opened this issue Dec 7, 2022 · 2 comments
Assignees
Labels
enhancement New feature or request good first issue Good for newcomers help wanted Extra attention is needed

Comments

@NickNtamp
Copy link
Contributor

NickNtamp commented Dec 7, 2022

As for now, the pipeline performs expainability per inference row.
Explainability for the whole inference dataset maybe be useful in the future.
Some related code has already be written:

  • Pipeline
def create_xai_pipeline_classification_per_inference_dataset(training_set: pd.DataFrame, target: str, inference_set: pd.DataFrame, type_of_task: str, load_from_path = None
)-> Dict[str, Dict[str, float]]:
    
    xai_dataset=training_set.drop(columns=[target])
    explainability_report={}

    # Make a mapping dict which will be used lated to map the explainer index
    # with the features names

    mapping_dict={}
    for feature in range (0,len(xai_dataset.columns.tolist())):
        mapping_dict[feature]=xai_dataset.columns.tolist()[feature]


    # Expainability for both classifications tasks
    # We have again to revisit here in the future as in case we upload the model
    # from the file system we don't care if it is binary or multiclass

    if type_of_task=='multiclass_classification':
        
        # Giving the option of retrieving the local model

        if load_from_path != None:
            model = joblib.load('{}/lgb_multi.pkl'.format(load_from_path))
        else:
            model, eval = create_multiclass_classification_training_model_pipeline(training_set, target)
            explainer = lime.lime_tabular.LimeTabularExplainer(xai_dataset.values, feature_names=xai_dataset.columns.values.tolist(), mode="classification",random_state=1)
        
        for inference_row in range(0,len(inference_set)):
            exp = explainer.explain_instance(inference_set.values[inference_row], model.predict)
            med_report=exp.as_map()
            temp_dict = dict(list(med_report.values())[0])
            map_dict = {mapping_dict[name]: val for name, val in temp_dict.items()}
            explainability_report["row{}".format(inference_row)]= map_dict
               

    elif type_of_task=='binary_classification':     
        
        # Giving the option of retrieving the local model

        if load_from_path != None:
            model = joblib.load('{}/lgb_binary.pkl'.format(load_from_path))
        else:
            model, eval = create_binary_classification_training_model_pipeline(training_set, target) 
            explainer = lime.lime_tabular.LimeTabularExplainer(xai_dataset.values, feature_names=xai_dataset.columns.values.tolist(), mode="classification",random_state=1)

        for inference_row in range(0,len(inference_set)):
            exp = explainer.explain_instance(inference_set.values[inference_row], model.predict_proba)
            med_report=exp.as_map()
            temp_dict = dict(list(med_report.values())[0])
            map_dict = {mapping_dict[name]: val for name, val in temp_dict.items()}
            explainability_report["row{}".format(inference_row)]= map_dict

            
    return explainability_report 
  • Unit tests
def test_create_xai_pipeline_classification_per_inference_dataset(self):
        binary_class_report =create_xai_pipeline_classification(df_binary,"target",df_binary_inference,"binary_classification")
        multi_class_report=create_xai_pipeline_classification(df_multi,"target",df_multi_inference,"multiclass_classification")
        binary_contribution_check_one = binary_class_report["row0"]["worst perimeter"]
        binary_contribution_check_two = binary_class_report["row2"]['worst texture']
        multi_contribution_check_one = multi_class_report["row0"]["hue"]
        multi_contribution_check_two = multi_class_report["row9"]["proanthocyanins"]
        assert (len(binary_class_report)) == len(df_binary_inference)
        assert (len(multi_class_report)) == len(df_multi_inference)
        assert (round(binary_contribution_check_one, 3)) == 0.253
        assert (round(binary_contribution_check_two, 2)) == -0.09
        assert (round(multi_contribution_check_one, 2)) == -0.08
        assert (round(multi_contribution_check_two, 3)) == -0.023
@NickNtamp NickNtamp changed the title Future enhancement: Perform explanability for the whole inference dataset Future enhancement: Perform explainability for the whole inference dataset Dec 7, 2022
@momegas momegas added enhancement New feature or request help wanted Extra attention is needed component/api labels Jan 20, 2023
@momegas momegas added this to the 🐶 Q1 2023 milestone Jan 20, 2023
@momegas momegas changed the title Future enhancement: Perform explainability for the whole inference dataset [Roadmap] Perform explainability for the whole inference dataset Jan 20, 2023
@momegas momegas modified the milestones: 🐶 Q1 2023, 😻 Q2 2023 Jan 22, 2023
@momegas momegas modified the milestones: 😻 Q2 2023, 🐶 Q1 2023 Feb 7, 2023
@momegas momegas changed the title [Roadmap] Perform explainability for the whole inference dataset Perform explainability for the whole inference dataset Feb 15, 2023
@momegas momegas added good first issue Good for newcomers and removed component/api labels Feb 15, 2023
@aditkay95
Copy link

@momegas I would like to start working on this

@momegas
Copy link
Member

momegas commented Feb 17, 2023

That would be great. Please start by opening a PR with the proposed changes (draft). Ill assign this to you. Thanks 🙏

@momegas momegas changed the title Perform explainability for the whole inference dataset Perform global explainability on the inference dataset Feb 22, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request good first issue Good for newcomers help wanted Extra attention is needed
Projects
None yet
Development

No branches or pull requests

3 participants