Ex. 9.5 #236

szcf-weiya · 2021-03-15T07:36:22Z

szcf-weiya · 2021-05-03T02:17:58Z

The following calculation assumes R_m is fixed, but I think it should not be fixed (see the following first several sentences). However, I failed to obtain a closed form when treating R_m depends on y.

szcf-weiya · 2021-05-03T02:29:18Z

binary greedy partition (without pruning)

for simplicity, I wrote a no-pruning version from scratch,

julia> includet("df_regtree.jl")

julia> [rep_calc_df(maxdepth=i) for i=0:4]
5-element Vector{Tuple{Float64, Float64}}:
 (1.080629443872466, 0.041187110826985264)
 (9.996914473623567, 0.11093327184178654)
 (21.34913882167176, 0.14589493792386243)
 (34.57260013395476, 0.2469943540268895)
 (47.90389232392168, 0.31056757408950914)

the number of terminal nodes M equal to 2^i, it can be shown that the empirical df is much larger than M except for M=1.

szcf-weiya · 2021-05-03T02:41:48Z

call `tree::prune.tree`

to return a tree with the given number of terminal nodes, I call tree::prune.tree, which has an argument best to specify the number of terminal nodes, but note that there might be situations that the specified best cannot be achieved, as mentioned in ?prune.tree

If there is no tree in the sequence of the requested size, the next largest is returned.

In my experiments, I indeed found such cases.

> source("df_regtree.R")
> mean(replicate(10, calc_df(m=1)))
[1] 1.137945
> mean(replicate(10, calc_df(m=5)))
[1] 32.18979
> mean(replicate(10, calc_df(m=10)))
[1] 50.46288

Again, the estimated dfs are much larger than m, except for m=1.

szcf-weiya · 2021-05-03T02:51:59Z

similar experiments in Ye (1998)

Ye, J. (1998). On Measuring and Correcting the Effects of Data Mining and Model Selection. Journal of the American Statistical Association, 93(441), 120–131. https://doi.org/10.2307/2669609

It also shows that the estimated (generalized) df are much larger than m.

And a close result is reported if I set m=19 in my code

> mean(replicate(10, calc_df(m=19)))
[1] 60.31616

litsh · 2023-12-05T08:24:56Z

Thanks for your great solution. May I ask why is the estimated degree of freedom so far from the one in theory?

szcf-weiya · 2023-12-05T15:15:57Z

@litsh what do you mean "the one in theory"? You mean the number of nodes? Actually, here I am trying to say that the number of nodes is not the theoretical degrees of freedom. There is a gap, and the gap is referred to as search cost.

If you are interested, you can check the paper on the excess part of degrees of freedom by comparing lasso and the best subset regression: Tibshirani, Ryan J. “Degrees of Freedom and Model Search.” Statistica Sinica 25, no. 3 (2015): 1265–96.

I also discussed the search cost of degrees of freedom for more methods (including the regression tree here) in my paper. https://arxiv.org/abs/2308.13630

litsh · 2023-12-06T02:33:16Z

Thank you for your reply! I will read the paper. Thaison ***@***.*** Original Email Sender:"szcf-weiya"< ***@***.*** >; Sent Time:2023/12/5 23:16 To:"szcf-weiya/ESL-CN"< ***@***.*** >; Cc recipient:"litsh"< ***@***.*** >;"Mention"< ***@***.*** >; Subject:Re: [szcf-weiya/ESL-CN] Ex. 9.5 (#236) @litsh what do you mean "the one in theory"? You mean the number of nodes? Actually, here I am trying to say that the number of nodes is not the theoretical degrees of freedom. There is a gap, and the gap is referred to as search cost. If you are interested, you can check the paper on the excess part of degrees of freedom by comparing lasso and the best subset regression: Tibshirani, Ryan J. “Degrees of Freedom and Model Search.” Statistica Sinica 25, no. 3 (2015): 1265–96. I also discussed the search cost of degrees of freedom for more methods (including the regression tree here) in my paper. https://arxiv.org/abs/2308.13630 — Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you were mentioned.Message ID: ***@***.***>

szcf-weiya added this to the Solutions 9 milestone Mar 15, 2021

szcf-weiya added the exercise label Mar 15, 2021

szcf-weiya added the enhancement label May 3, 2021

szcf-weiya added a commit that referenced this issue May 3, 2021

add code for #236

506edf6

szcf-weiya mentioned this issue May 3, 2021

Ex 9.5a YuhangZhou88/ESL_Solution#3

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Ex. 9.5 #236

Ex. 9.5 #236

szcf-weiya commented Mar 15, 2021

szcf-weiya commented May 3, 2021 •

edited

szcf-weiya commented May 3, 2021

szcf-weiya commented May 3, 2021

szcf-weiya commented May 3, 2021

litsh commented Dec 5, 2023

szcf-weiya commented Dec 5, 2023

litsh commented Dec 6, 2023 via email

Ex. 9.5 #236

Ex. 9.5 #236

Comments

szcf-weiya commented Mar 15, 2021

szcf-weiya commented May 3, 2021 • edited

szcf-weiya commented May 3, 2021

binary greedy partition (without pruning)

szcf-weiya commented May 3, 2021

call tree::prune.tree

szcf-weiya commented May 3, 2021

similar experiments in Ye (1998)

litsh commented Dec 5, 2023

szcf-weiya commented Dec 5, 2023

litsh commented Dec 6, 2023 via email

szcf-weiya commented May 3, 2021 •

edited

call `tree::prune.tree`