Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Variable importance () plot #1411

Open
hanneleer opened this issue Apr 26, 2024 · 4 comments
Open

Variable importance () plot #1411

hanneleer opened this issue Apr 26, 2024 · 4 comments
Labels

Comments

@hanneleer
Copy link

Dear all,

I was wondering if someone could help me with the following:

I use the variable importance measure to describe which variables are chosen most often by the causal forest algorithm. However, now i want to know at which levels/values each variables tended to split (on average). Is there a possibility to grow a tree on the most important variables?

Thanks a lot already!

@erikcs
Copy link
Member

erikcs commented May 1, 2024

Hi @hanneleer, you could calculate that using the function get_tree that gives you details on the split variable and level for every tree. You can also fit a new forest on the most important variables, Algorithm 1 here gives an example of that. For other visualizations you might find some of the example plots in this tutorial useful.

@erikcs erikcs added the question label May 1, 2024
@hanneleer
Copy link
Author

Thanks a lot for your response and insights @erikcs ! I would like to pose an additional question regarding the two possibilities you highlighted, if I may.

When running a new forest analysis on the most important variables, each tree typically splits based on the most influential variable, leading to diverse splits (I would suppose it never splits on the same variable first) across trees in the forest. With, say, 2000 trees, each might choose a different variable for its initial split and a different value for this variable to split on.

If I aim to visualize an aggregate visualization, reflecting the average of these splits and discerning which variable tends to be prioritized first across the first, facilitating insights into the policy's differential impacts. Is this something that is possible? Or am I limited to utilizing the get_tree function, which only provides a single tree from the forest?

Thanks a lot for your time!

@erikcs
Copy link
Member

erikcs commented May 2, 2024

Hi @hanneleer, something like the heatmaps that visualize covariate levels across HTE predictions in the above tutorial link is typically what we'd recommend over focusing on on every single split in the forest.

@hanneleer
Copy link
Author

I will dive deeper into this, thanks a lot! @erikcs

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants