Skip to content

MattJBritton/ForestfortheTrees

Repository files navigation

Binder

ForestfortheTrees

GIF of user exploring individual trees

This library generates visual explanations of Gradient Boosting models. I recommend you jump in through the Interactive Notebook on Binder. This interactive Jupyter notebook is an Explainable that showcases the value of the library and provides sample code.

You can see more of my work on my website, or check out the presentation I gave at VISxAI 2019. My presentation even got mentioned in Uncharted's VIS 2019 Highlights!

Installation

Alternatively, you can run notebook.ipynb locally by cloning the repository and then performing the following:

  1. Navigate into the package directory. cd ForestForTheTrees
  2. Install the conda environment. conda env create binder/environment.yml
  3. Activate the conda environment. conda activate ForestForTheTrees
  4. Run the postBuild script (this installs the appropriate jupyterlab extension required to display interactive widgets) bash binder/postBuild or just run this command directly jupyter labextension install @jupyter-widgets/jupyterlab-manager.
  5. Fire up Jupyter Lab, run all cells and begin interacting with the notebook. jupyter lab notebook.ipynb

Note that a recent version of Jupyter Lab (included in the environment) is required to run this notebook - Jupyter notebooks will not work (at least out of the box). This is due to some peculiarities in the interaction of Altair, ipywidgets, and Jupyter.

I recommend running all cells as soon as the notebook is opened. Due to the nature of the interactive widgets, it is not possible to save the state, so the notebook is saved without output. If you are perusing the full document, each cell will have run by the time you get to it. This applies whether viewing locally or via Binder.

Usage

As mentioned above, the best way to get a sense of how Gradient Boosting models can be explained with ForestForTheTrees is to run the Binder link above. To get started quickly, adapt the minimal example below:

#load dataset
dataset_df = pd.read_csv("Some_file.csv")
target_column = "Target"  #the value to predict

#build model
model = GradientBoostingRegressor(
    n_estimators = 100
)

#fit model
model.fit(
    dataset_df.drop(target_column, axis = 1),
    dataset_df.loc[:,target_column]
) #you should build a good model here using train/test split

#initialize ForestForTheTrees with dataset, model, and target
f2t = ft.ForestForTheTrees(
    dataset = dataset_df, #pass bike instead to use the sample dataset
    model = model,
    target_col = "Ridership"
)

#extract the underlying structure of the model
#this must be called before displaying the visual explanation
f2t.extract_components()

#output the visual explanation at the selected fidelity
f2t.explain(
    fidelity_threshold = .95
)

5-chart explanation for bike dataset

Development

This library is under active development - please review the Issues tab for current priorities. Feature requests and bug reports are welcomed! If you find this library useful, please feel free to message me and let me know how it went.

Developed using Python and the Python data science stack, particularly numpy, pandas, and scikit-learn. Altair was used for data visualization.

About

Interactive visualization of ensemble ML algorithms (e.g. Gradient Boosting Classifiers) for explainable ML.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages