Vectorize matching and ATE computation to speed up computation #935

jcreinhold · 2023-05-03T18:38:42Z

Is your feature request related to a problem? Please describe.
dowhy is a great package, but the matching methods are unnecessarily slow for large datasets.

Describe the solution you'd like
In propensity score matching and distance matching, the matching and ATT/ATC computation is done in a Python for loop which gets prohibitively time-consuming for large datasets (e.g., here). It's also unnecessary.

For example, we can change this:

for i in range(numtreatedunits):
    treated_outcome = treated.iloc[i][self._target_estimand.outcome_variable[0]].item()
    control_outcome = control.iloc[indices[i]][self._target_estimand.outcome_variable[0]].item()
    att += treated_outcome - control_outcome

To something like this (this is untested, but something like this should work):

outcome_variable = self._target_estimand.outcome_variable[0]
treated_outcome = treated[outcome_variable]
control_outcome = control.iloc[indices][outcome_variable]
att = (treated_outcome - control_outcome).mean().item()

For datasets with >100k observations, this can provide a large speed-up.

Describe alternatives you've considered

Leaving it as is. There isn't much reason to leave this as is. We already are using numpy and pandas so we should make use of the constructs available in them to avoid the Python interpreter overhead.

Additional context
I can submit a PR if this is acceptable.

The text was updated successfully, but these errors were encountered:

jcreinhold added the enhancement New feature or request label May 3, 2023

rahulbshrestha mentioned this issue May 12, 2024

Vectorize operations for propensity score matching #1179

Open

3 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Vectorize matching and ATE computation to speed up computation #935

Vectorize matching and ATE computation to speed up computation #935

jcreinhold commented May 3, 2023 •

edited

Vectorize matching and ATE computation to speed up computation #935

Vectorize matching and ATE computation to speed up computation #935

Comments

jcreinhold commented May 3, 2023 • edited

jcreinhold commented May 3, 2023 •

edited