Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

weight matrix in IPW calculation can have weights info due to division by zero #9

Open
glotglutton opened this issue Jul 8, 2020 · 2 comments

Comments

@glotglutton
Copy link

weight_matrix = probabilities.rdiv(1.0) statement in this file can return inf weights if some entries in "probabilities" series are zero. Maybe there could be some way to ignore the corresponding inf weights while applying weight_matrix later on.

@ehudkr
Copy link
Collaborator

ehudkr commented Jul 9, 2020

Thanks for the input.
I'll admit that unless I'm missing something, what you're describing sounds more like a modelling issue than a bug. For reasons I describe below, I think it will be bad design to sweep it under the rug, however, let's first start with two practical workarounds I can think of:

One possible solution to avoid zero-propensities would be to clip your propensities. That can be done with the truncate_eps parameter, for example:

from causallib.estimation import IPW
ipw = IPW(truncate_eps=0.01)

will clip all propensity scores between 0.01 and 0.99, so all your zeros will become 0.01, thus evading a division by zero when inverting the scores into weights.

Alternatively, you can first obtain the weights, manipulate them as you wish, and then provide them to the potential outcomes estimation:

w = ipw.compute_weights(X, a)
# Manipulate `w` as you wish (also the corresponding `X, a, y` if needed)
outcomes = ipw.estimate_population_outcome(X, a, y, w)

Theoretical context

So those were technical data manipulations to work around zero-propensities.
The reason I want users to be aware of that and not solve it under the hood is because in a causal-inference context, zero propensity scores can suggests a violation of the positivity assumption¹. Which, in turn, could potentially invalidate the results one gets from the model.

As for your case, if I may suggest, you might want to try to evaluating your model, and possibly detect where the problem originates from:

from causallib.evaluation import PropensityEvaluator

evaluations = evaluator.evaluate_simple(
    ipw, X, a, y, 
    plots=["roc_curve"", "weight_distribution", "covariate_balance_love", "calibration"]
)

¹The positivity assumption is formulated as Pr[A=a|X]>0 for a=[0, 1]. Intuitively, zero propensity suggests there's a subspace of covariates occupied by the control group alone, and so, without similar data-points from the intervention group in that subspace, it is impossible to extrapolate what would've been the outcomes of these control data points if they had receive the treatment.

@ehudkr
Copy link
Collaborator

ehudkr commented Jul 9, 2020

I'll be happy to know if that helped you solve your problem, or get any feedback relating to whether this approach makes sense to you.
Thanks again for taking the time to raise this issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants