ATE over subsets with low and high estimated CATEs - nonsensical results #1287

RamirezAmayaS · 2023-04-09T02:27:34Z

Description of the bug
I am revisiting the analysis of Athey and Wager (2019). I am interested in running a falsification analysis where the causal forests are trained not on the student and school covariates but rather on randomly generated vectors. My prior is that the heterogeneity tests should fail to reject the null of no heterogeneity. However when comparing subsets with high and low estimated CATEs, the estimated average treatment effect on the high subset is close to zero and the estimated average treatment effect on the low subset is order of magnitudes larger. I can't find an explanation for this behavior. Is it a bug?

The other tests seem fine. The global ATE is close enough to the original results. The calibration test fails to reject the null of no heterogeneity.

Steps to reproduce

library(grf)

df = read.csv("experiments/acic18/synthetic_data.csv")

X = matrix(runif(n=nrow(df)*10),nrow=nrow(df))
X.colnames = c("RF1","RF2","RF3","RF4","RF5","RF6","RF7","RF8","RF9","RF0")

Z = df$Z
Y = df$Y

Y.forest = regression_forest(
    X , 
    Y
)

Y.hat = predict(Y.forest)$ predictions
Z.forest = regression_forest(
    X , 
    Z
)

Z.hat = predict(Z.forest)$predictions

cf.raw = causal_forest(
    X, 
    Y, 
    Z,
    Y.hat = Y.hat, 
    W.hat = Z.hat
)

varimp = variable_importance(cf.raw)
selected.idx = which(varimp > mean(varimp))

cf = causal_forest(
    X[,selected.idx], 
    Y, 
    Z,
    Y.hat = Y.hat, 
    W.hat = Z.hat,
    tune.parameters = "all"
)

tau.df = predict(cf,estimate.variance=TRUE)[,c(1,2)]
tau.hat = tau.df$predictions

# Distribution of predicted effects
hist(tau.hat)

# Average trearment effect
ATE = average_treatment_effect(cf)
paste(
    "95% CI for the ATE:", 
    round(ATE[1],3), 
    "+/-", 
    round(qnorm(0.975)*ATE[2],3)
)

Outputs: '95% CI for the ATE: 0.303 +/- 0.026'

# Compare regions with high and low estimated CATE
high_effect = tau.hat.unsorted > median(tau.hat.unsorted)
ate.high = average_treatment_effect(cf, subset=high_effect)
ate.low = average_treatment_effect(cf, subset=!high_effect)
paste(
    "95% CI for the difference in ATE:",
    round(ate.high[1] - ate.low[1],3),
    "+/-",
    round(qnorm(0.975)*sqrt(ate.high[2]^2 + ate.low[2]^2),3)
)

Outputs: '95% CI for the difference in ATE: -0.56 +/- 0.051'

average_treatment_effect(cf, subset=high_effect)

Outputs: estimate:-0.00124768810374905 std.err: 0.0182608951524164

average_treatment_effect(cf, subset=!high_effect)

Outputs: estimate: 0.608046759001875 std.err: 0.0182508648601049

# Test calibration
test_calibration(cf)

Outputs:

Best linear fit using forest predictions (on held-out data)
as well as the mean forest prediction as regressors, along
with one-sided heteroskedasticity-robust (HC3) SEs:

                                  Estimate  Std. Error t value Pr(>t)    
mean.forest.prediction            1.001729    0.041462  24.160 <2e-16 ***
differential.forest.prediction -682.911383   24.255158 -28.155      1    
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

GRF version
grf_2.2.1

The text was updated successfully, but these errors were encountered:

erikcs · 2023-04-11T00:52:15Z

Hi @RamirezAmayaS, what you are observing is unfortunately a known artifact of doing these kinds of evaluations using Out-of-Bag (OOB) estimates. The suggested modern approach is to use the RATE with a training and evaluation sample. If you repeat your example from above, then you should see a flat TOC curve / zero RATE (when using a train/test split).

RamirezAmayaS · 2023-04-11T01:32:45Z

Hi @erikcs , thanks for the suggestion. I'll try the RATE approach. Do you know of any reference explaining why the OOB evaluation fails by any chance?

erikcs · 2023-04-14T22:55:44Z

I'm not sure about reference, but here is a simple example illustrating the issue with an OOB mean:

Let $Y_i \sim Bernoulli(\mu)$, $i=1...n$, with mean $\mu=0.5$.

Then $\mu^{(-1)} = \mu - (Y_i - \bar Y) / (n - 1)$ and

$E[Y_i | \mu^{(-1)} > 0.5] = 0$

$E[Y_i | \mu^{(-1)} < 0.5] = 1$.

RamirezAmayaS · 2023-06-21T00:53:14Z

Thanks for your reply.

I don't think I'm following. Shouldn't the OOB mean be $\mu_{j}^{(-1)} = \frac{1}{(n-1)} \sum_{i \neq j}{Y_i}$ ?

RamirezAmayaS changed the title ~~ATE over subset with low and high estimated CATEs - nonsensical results~~ ATE over subsets with low and high estimated CATEs - nonsensical results Apr 9, 2023

erikcs added the question label Apr 11, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ATE over subsets with low and high estimated CATEs - nonsensical results #1287

ATE over subsets with low and high estimated CATEs - nonsensical results #1287

RamirezAmayaS commented Apr 9, 2023 •

edited

erikcs commented Apr 11, 2023

RamirezAmayaS commented Apr 11, 2023 •

edited

erikcs commented Apr 14, 2023

RamirezAmayaS commented Jun 21, 2023

ATE over subsets with low and high estimated CATEs - nonsensical results #1287

ATE over subsets with low and high estimated CATEs - nonsensical results #1287

Comments

RamirezAmayaS commented Apr 9, 2023 • edited

erikcs commented Apr 11, 2023

RamirezAmayaS commented Apr 11, 2023 • edited

erikcs commented Apr 14, 2023

RamirezAmayaS commented Jun 21, 2023

RamirezAmayaS commented Apr 9, 2023 •

edited

RamirezAmayaS commented Apr 11, 2023 •

edited