Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[WIP] add test implementation of carrillo and rosenbaum #208

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

ljwolf
Copy link
Member

@ljwolf ljwolf commented May 31, 2022

This is a test of the carrillo and rosenbaum (2016) counterfactual spatial distribution estimator. I'm not sure of the precise mathematical relationship between this and Cortes et al. (2021)'s cdf estimator...

since this is general-purpose resampling, should it live in esda? I suppose that the cdf counterfactualizer is similarly generic... @renanxcortes @knaaptime @sjsrey would you rather this kind of thing live in segregation, too?

@ljwolf ljwolf changed the title add test implementation of carrillo and rosenbaum counterfactual esti… [WIP] add test implementation of carrillo and rosenbaum counterfactual esti… May 31, 2022
@ljwolf ljwolf changed the title [WIP] add test implementation of carrillo and rosenbaum counterfactual esti… [WIP] add test implementation of carrillo and rosenbaum May 31, 2022
)
# 5
self.actual_ = y
self.counterfactual_ = self.tau_ * self.actual_
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not yet certain that this does the "right" thing. C&R say you need to apply tau against P(y1,y2|tau), and the empirical distribution of that is actual_ (y). But, I'm not sure where the kernel density function re-weighting needs to come in? Need to continue working on it.

@knaaptime
Copy link
Member

hm, maybe the opposite, actually. I think of esda as the second-most central layer in the pysal stack so it might be preferable to move some of the counterfactual generators from segregation over here instead? I rewrote them all for parallelization maybe a year ago, so they all live in one spot. That would make them easy to port into esda if they're useful elsewhere in the ecosystem?

@renanxcortes
Copy link

renanxcortes commented May 31, 2022

Hi @ljwolf and @knaaptime , long time no see! I remember this was one of the first thigs I developed when I arrived in the CGS, but developed an R framework with plotly implementation of carrillo and rosenbaum (2016). The main hint was that the binary dependent model for the propensity score matching should "separate" well the groups, that is why the logistic regression used has some non-linear terms (the authors explained this to me through e-mail). Will look here my historical data and share with you, ok?

Did you receive the e-mail with the files attached (matlab and R codes) @ljwolf and @knaaptime ?

@renanxcortes
Copy link

renanxcortes commented May 31, 2022

I'm not sure of the precise mathematical relationship between this and Cortes et al. (2021)'s cdf estimator...

So, technically, I believe they are quite different since their approach relies on matching using covariates, whereas our approach is not modeled with covariates...

@ljwolf
Copy link
Member Author

ljwolf commented Jun 1, 2022

long time no see!

Yes! good to see you (virtually) 😄

Did you recieve the e-mail?

Yes! Thank you very much, @renanxcortes!! That's super helpful.

binary dependent model for the psm should "separate" well the groups

Yes, definitely this makes sense, because the "power" of the method is based on that odds ratio weight, tau. In theory (and this implementation), you could use any estimator that provides predicted probabilities for observation i to be in time t, given its traits x_i (like, the method could use trees, XGBoost, anns, etc)

I believe they are quite different...

I know it looks this way on the surface. But, the C&R approach can just "ignore" X and still create tau and re-weight... so I wonder whether there may be a way to relate them formally using the "pooled" ecdf in the case of no exogenous information.

Regardless I'll adapt & extend the R script you sent along to correct this test implementation! And, I agree w/ @knaaptime that it makes sense to have them in the same place if there's more than one... and I agree esda makes sense, but I really don't mind wherever these get put!

@renanxcortes
Copy link

long time no see!

Yes! good to see you (virtually) 😄

Did you recieve the e-mail?

Yes! Thank you very much, @renanxcortes!! That's super helpful.

binary dependent model for the psm should "separate" well the groups

Yes, definitely this makes sense, because the "power" of the method is based on that odds ratio weight, tau. In theory (and this implementation), you could use any estimator that provides predicted probabilities for observation i to be in time t, given its traits x_i (like, the method could use trees, XGBoost, anns, etc)

I believe they are quite different...

I know it looks this way on the surface. But, the C&R approach can just "ignore" X and still create tau and re-weight... so I wonder whether there may be a way to relate them formally using the "pooled" ecdf in the case of no exogenous information.

Regardless I'll adapt & extend the R script you sent along to correct this test implementation! And, I agree w/ @knaaptime that it makes sense to have them in the same place if there's more than one... and I agree esda makes sense, but I really don't mind wherever these get put!

Awesome!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants