Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Are there any way to interprete the DF21 model, like graphviz for Decision tree or SHAP for XGB? #12

Closed
Qiao-27 opened this issue Feb 2, 2021 · 18 comments
Labels
feature request New feature or request

Comments

@Qiao-27
Copy link

Qiao-27 commented Feb 2, 2021

No description provided.

@xuyxu
Copy link
Member

xuyxu commented Feb 2, 2021

Q1: Is there any way to interprete the model?
A1: Yes, you can check the structure of forest in DF21 (https://github.com/LAMDA-NJU/Deep-Forest/blob/master/deepforest/forest.py#L331). Suppose that a forest has 100 base decision trees, then

  • features and thresholds are both a list of size 100, with the i-th element storing the splitting features / threshold used in all internal nodes in the i-th tree (the i-th element is an array of size (n_internal_nodes,))
  • childrens is also a list of size 100, with the i-th element storing the left and right child id for all internal nodes in the i-th tree (the i-th element is an array of size (n_internal_nodes, 2))
  • The i-th element of values stores the leaf node predictions for all leaf nodes in the i-th tree (the i-th element an array of size (n_leaf_nodes, n_classes))

Q2: Something like graphviz for Decision tree?
A2: We will work on this! The tree structure in DF21 is a reduced version of decision tree in Scikit-Learn for the sake of speed and memory efficiency, it should be possible to make it compatible with the method sklearn.tree.export_graphviz

@xuyxu xuyxu added the feature request New feature or request label Feb 2, 2021
@xuyxu xuyxu mentioned this issue Feb 2, 2021
13 tasks
@xuyxu
Copy link
Member

xuyxu commented Feb 2, 2021

Closed via issue #14

@xuyxu xuyxu closed this as completed Feb 2, 2021
@Maryom
Copy link

Maryom commented Feb 13, 2021

Hey,
Is there an example code on how to interprete DF21 with SHAP?

@xuyxu xuyxu reopened this Feb 14, 2021
@xuyxu
Copy link
Member

xuyxu commented Feb 14, 2021

Hey,
Is there an example code on how to interprete DF21 with SHAP?

After taking a look at this page, I think there should be no problem if we are going to implement some methods on exporting the tree information in DF21 to SHAP.

@xuyxu
Copy link
Member

xuyxu commented Feb 14, 2021

Are you willing to take part into the development of this feature request @Maryom, @Qiao-27 😄? After a quick discussion with the team members, we think that improving the interpretability of DF21 should be a top priority.

@Maryom
Copy link

Maryom commented Feb 14, 2021

@xuyxu Hey I will try with it by following the steps in page
if there any addition steps please let me know.

@Maryom
Copy link

Maryom commented Feb 14, 2021

Hey,

Could you please make DF21 compatible with python 3.8.2? because I'm a Mac user and SHAP needs python 3.8.2 to run on Mac.

I'm trying to run: pip3 install --verbose -e . on Deep-Forest directory and I got the following error:

ERROR: Command errored out with exit status 1: /Library/Developer/CommandLineTools/usr/bin/python3 /Library/Python/3.8/site-packages/pip/_vendor/pep517/_in_process.py prepare_metadata_for_build_wheel /var/folders/xv/x44j_r3n2jlfvsy3k_44w1rw0000gn/T/tmprantsua9 Check the logs for full command output.

Before I was able to run it successfully with python 3.9.1 but now I need to use python 3.8.2

@xuyxu
Copy link
Member

xuyxu commented Feb 14, 2021

Hi @Maryom, could you delete the pyproject.toml in the home diertory and re-install the package in the editable mode?

If there is still a problem, could you give me more traceback information?

@Maryom
Copy link

Maryom commented Feb 14, 2021

Hi @xuyxu after deleting pyproject.toml I got the following errors:

Error compiling Cython file:
    ------------------------------------------------------------
    ...
            shape[2] = <np.npy_intp> self.max_n_classes

            cdef np.ndarray arr
            arr = np.PyArray_SimpleNewFromData(3, shape, np.NPY_DOUBLE, self.value)
            Py_INCREF(self)
            arr.base = <PyObject*> self
              ^
    ------------------------------------------------------------

    deepforest/tree/_tree.pyx:908:11: Assignment to a read-only property

    Error compiling Cython file:
    ------------------------------------------------------------
    ...
            arr = PyArray_NewFromDescr(<PyTypeObject *> np.ndarray,
                                       <np.dtype> NODE_DTYPE, 1, shape,
                                       strides, <void*> self.nodes,
                                       np.NPY_DEFAULT, None)
            Py_INCREF(self)
            arr.base = <PyObject*> self

@xuyxu
Copy link
Member

xuyxu commented Feb 14, 2021

what is the version of your cython and numpy package?

@Maryom
Copy link

Maryom commented Feb 14, 2021

numpy version is 1.20.1 because SHAP needs this version.

cython version is 0.29.21

I got the following warnings:

clang: deepforest/_forest.c
    In file included from deepforest/_forest.c:635:
    In file included from /Library/Python/3.8/site-packages/numpy/core/include/numpy/arrayobject.h:4:
    In file included from /Library/Python/3.8/site-packages/numpy/core/include/numpy/ndarrayobject.h:12:
    In file included from /Library/Python/3.8/site-packages/numpy/core/include/numpy/ndarraytypes.h:1822:
    /Library/Python/3.8/site-packages/numpy/core/include/numpy/npy_1_7_deprecated_api.h:17:2: warning: "Using deprecated NumPy API, disable it with "          "#define NPY_NO_DEPRECATED_API NPY_1_7_API_VERSION" [-W#warnings]
    #warning "Using deprecated NumPy API, disable it with " \

@Maryom
Copy link

Maryom commented Feb 14, 2021

Okay after I downgraded numpy I got Successfully installed deep-forest However now I'm NOT able to run SHAP because it needs numpy version 1.20.1.

Any solution please?

@xuyxu
Copy link
Member

xuyxu commented Feb 14, 2021

I think we can safely use a lower version of shap that only requires Numpy 1.19 :-)

@Maryom
Copy link

Maryom commented Feb 14, 2021

It worked perfectly with SHAP 0.37.0

@xuyxu
Copy link
Member

xuyxu commented Feb 14, 2021

Great! I will open a PR latter, and list several steps we are going to do there 😄. Thanks

@Maryom
Copy link

Maryom commented Feb 14, 2021

@xuyxu Nice, I hope we will add SHAP support ASAP 🙏🏼

@xuyxu
Copy link
Member

xuyxu commented Feb 14, 2021

cc @Maryom

Steps:

  • Implement the tree explainer for a single forest in deep forest
  • Implement the explainer for the entire deep forest model
  • Implement the tree explainer for the entire deep forest model (Optional)

It may be a good choice to start with the first step. You can create a file in deepforest/ named _explainer.py and here are some docstrings on the method used in the first step:

def get_shap_explainer_forest(forest):
    """
    Get the tree explainer for a forest estimator in the deep forest.

    Parameters
    ----------
    forest : :obj:`forest`
        The forest estimator that we want to explain.

    Returns
    -------
    TreeExplainer : :obj:`shap.TreeExplainer`
        Tree explainer for the forest estimator.
    """

The forest structure used in DF21 is available here. Suppose that we have a total number of 100 trees in a forest, then the lists of features, thresholds, childrens and values will all have 100 elements, and the meaning of the i-th element is listed as follow:

  • features[i]: splitting attributes used in the internal nodes of i-th decision tree in the forest shape: (n_internal_nodes,)
  • thresholds[i]: splitting cut-offs used in the internal nodes of i-th decision tree in the forest shape: (n_internal_nodes,)
  • childrens[i]: left / right child IDs for the internal nodes of i-th decision tree in the forest shape: (n_internal_nodes, 2)
  • values[i]: predictions for the leaf nodes of i-th decision tree in the forest shape: (n_leaf_nodes, n_outputs)

We can first follow instructions in the Shap Documentation, and see if there is any problem when passing the forest model in DF21 to shap.TreeExplainer.

Since the forest structure is much different from that in Scikit-Learn, I am not sure if things will go smooth. Feel free to ask me in this PR if you have any problem, and I will reply to you ASAP.

BTW, you can open the PR once _explainer.py created, and we can have more discussions there 😄

@xuyxu
Copy link
Member

xuyxu commented May 12, 2021

Closed via #14

@xuyxu xuyxu closed this as completed May 12, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature request New feature or request
Projects
None yet
Development

No branches or pull requests

3 participants