Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Vectorize matching and ATE computation to speed up computation #935

Open
jcreinhold opened this issue May 3, 2023 · 0 comments
Open

Vectorize matching and ATE computation to speed up computation #935

jcreinhold opened this issue May 3, 2023 · 0 comments
Labels
enhancement New feature or request

Comments

@jcreinhold
Copy link

jcreinhold commented May 3, 2023

Is your feature request related to a problem? Please describe.
dowhy is a great package, but the matching methods are unnecessarily slow for large datasets.

Describe the solution you'd like
In propensity score matching and distance matching, the matching and ATT/ATC computation is done in a Python for loop which gets prohibitively time-consuming for large datasets (e.g., here). It's also unnecessary.

For example, we can change this:

for i in range(numtreatedunits):
    treated_outcome = treated.iloc[i][self._target_estimand.outcome_variable[0]].item()
    control_outcome = control.iloc[indices[i]][self._target_estimand.outcome_variable[0]].item()
    att += treated_outcome - control_outcome

To something like this (this is untested, but something like this should work):

outcome_variable = self._target_estimand.outcome_variable[0]
treated_outcome = treated[outcome_variable]
control_outcome = control.iloc[indices][outcome_variable]
att = (treated_outcome - control_outcome).mean().item()

For datasets with >100k observations, this can provide a large speed-up.

Describe alternatives you've considered

Leaving it as is. There isn't much reason to leave this as is. We already are using numpy and pandas so we should make use of the constructs available in them to avoid the Python interpreter overhead.

Additional context
I can submit a PR if this is acceptable.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

1 participant