Variable importance () plot #1411

hanneleer · 2024-04-26T12:43:18Z

Dear all,

I was wondering if someone could help me with the following:

I use the variable importance measure to describe which variables are chosen most often by the causal forest algorithm. However, now i want to know at which levels/values each variables tended to split (on average). Is there a possibility to grow a tree on the most important variables?

Thanks a lot already!

erikcs · 2024-05-01T19:31:29Z

Hi @hanneleer, you could calculate that using the function get_tree that gives you details on the split variable and level for every tree. You can also fit a new forest on the most important variables, Algorithm 1 here gives an example of that. For other visualizations you might find some of the example plots in this tutorial useful.

hanneleer · 2024-05-02T07:40:48Z

Thanks a lot for your response and insights @erikcs ! I would like to pose an additional question regarding the two possibilities you highlighted, if I may.

When running a new forest analysis on the most important variables, each tree typically splits based on the most influential variable, leading to diverse splits (I would suppose it never splits on the same variable first) across trees in the forest. With, say, 2000 trees, each might choose a different variable for its initial split and a different value for this variable to split on.

If I aim to visualize an aggregate visualization, reflecting the average of these splits and discerning which variable tends to be prioritized first across the first, facilitating insights into the policy's differential impacts. Is this something that is possible? Or am I limited to utilizing the get_tree function, which only provides a single tree from the forest?

Thanks a lot for your time!

erikcs · 2024-05-02T14:52:49Z

Hi @hanneleer, something like the heatmaps that visualize covariate levels across HTE predictions in the above tutorial link is typically what we'd recommend over focusing on on every single split in the forest.

hanneleer · 2024-05-06T07:01:38Z

I will dive deeper into this, thanks a lot! @erikcs

erikcs added the question label May 1, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Variable importance () plot #1411

Variable importance () plot #1411

hanneleer commented Apr 26, 2024

erikcs commented May 1, 2024

hanneleer commented May 2, 2024

erikcs commented May 2, 2024

hanneleer commented May 6, 2024

Variable importance () plot #1411

Variable importance () plot #1411

Comments

hanneleer commented Apr 26, 2024

erikcs commented May 1, 2024

hanneleer commented May 2, 2024

erikcs commented May 2, 2024

hanneleer commented May 6, 2024