Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RATE with low treatment propensities --- target.sample="treated"? #1332

Open
robert702 opened this issue Aug 24, 2023 · 4 comments
Open

RATE with low treatment propensities --- target.sample="treated"? #1332

robert702 opened this issue Aug 24, 2023 · 4 comments
Labels

Comments

@robert702
Copy link

robert702 commented Aug 24, 2023

I am using causal_forest for an RCT were the treatment group has a very low treatment propensity: N control is 1 million, N treatment is 20,000.

When I calculate average_treatment_effect I get a warning that I should use the option target.sample="treated". This number is indeed much different from the overall average_treatment_effect (despite randomization) and it is also closer to what I get using OLS, which makes sense.

I now want to use RATE to evaluate the presence of heterogeneity. I wonder if I should be making any adjustment to account for the low treatment propensities. If there is no pre-loaded option, I could go to the source code myself, but any guidance on whether something like this is needed or not, would be greatly appreciated.

Thanks.

@erikcs
Copy link
Member

erikcs commented Aug 24, 2023

Hi @robert702, that's an interesting question. Since the AUTOC can be represented as a weighted ATE ((8) in https://arxiv.org/pdf/2111.07966.pdf) I wonder if RATE + Crump et al. (2009)'s subsetting via estimated propensities is reasonable, what do you say @syadlowsky ?

You could estimate this with for example the following, computing the AUTOC for units with estimated propensities larger than 0.1:

rank_average_treatment_effect(evaluation.forest,
                              priorities,
                              subset = evaluation.forest$W.hat > 0.1)

@robert702
Copy link
Author

robert702 commented Aug 24, 2023

Thanks Erick,

I was thinking on something like calculating the TOC "manually" calculating ATEs using the function average_treatement_effect (target.sample="treated" ) in the test sample, over bins calculated using priorities taken from the training sample.

Specifically, the RATE source code has:
//
ATE <- sum(DR.scores.sorted * sample.weights) / sample.weights.sum
TOC <- cumsum(DR.scores.sorted * sample.weights) / sample.weights.cumsum - ATE
RATE <- wtd.mean(TOC, sample.weights)
//
This is from lines 440-442 here: https://github.com/grf-labs/grf/blob/master/r-package/grf/R/rank_average_treatment.R

For the TOC, I was thinking on taking the priorities from the original forest, split them into 100 groups. Then, running cumulative over the groups group, instead of taking the average of the scores, I can calculate average treatment effect on the treated sample with the correction in the average_treatment_effect function, using the option target.sample = "treated", or the "overlap" version.

In the aggregate, treatment effects with target.sample="treated" and target.sample="overlap" indeed give very similar results.

Does this seem like a reasonable approach to you? Or is there any conceptual missuderstanding?

Here is a rough script ---

rm(list = ls())
library(grf)

n <- 15000
p <- 5
X <- matrix(rnorm(n * p), n, p)
W <- rbinom(n, 1, 0.5)
event.prob <- 1 / (1 + exp(2*(pmax(2*X[, 1], 0) * W - X[, 2])))
Y <- rbinom(n, 1, event.prob)
train <- sample(1:n, n / 2)
cf.priority <- causal_forest(X[train, ], Y[train], W[train])

priority.cate <- 1 * predict(cf.priority, X[-train, ])$predictions

centile <- cut(priority.cate, breaks = quantile(priority.cate, probs = seq(0, 1, by = 0.01)), labels = FALSE)

summary(centile)

prioritygroup<- 101 - centile

cf.eval <- causal_forest(X[-train, ], Y[-train], W[-train])

ATE<- as.numeric(average_treatment_effect(cf.eval,target.sample = "treated")[1])

TOC <- numeric(100)

for (i in 1:100) {
TOC[i]<-average_treatment_effect(cf.eval, subset=(prioritygroup<=i),target.sample = "treated") - ATE
}

plot(TOC, type = "l", xlab = "Priority group", ylab = "ATE of priority group - ATE", main = "TOC")

@erikcs
Copy link
Member

erikcs commented Aug 25, 2023

My immediate reaction would be to just do what's posted above, that's one of the reasons I added the subset argument to the rate function.

@robert702
Copy link
Author

Thanks Erick. I imagine that could work when there are enough observations with propensities above 0.1. As I was saying earlier, In my setting, the mass of propensities is at 0.02, so the simple subseting you proposed would not work.

I could just take a random sample of the control group to have a more balanced design, or use the suggestion described in the documentation in the average_treatment_effect function, as I described in my previous post: target_group("treated") or target_group("overlap"). Any thoughts on which of the two would be more appropriate? Or alternative approaches when there are basically no obaservations with propensities> 0.1?

Thanks in advance!!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants