You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Is your feature request related to a problem? Please describe. dowhy is a great package, but the matching methods are unnecessarily slow for large datasets.
Describe the solution you'd like
In propensity score matching and distance matching, the matching and ATT/ATC computation is done in a Python for loop which gets prohibitively time-consuming for large datasets (e.g., here). It's also unnecessary.
For datasets with >100k observations, this can provide a large speed-up.
Describe alternatives you've considered
Leaving it as is. There isn't much reason to leave this as is. We already are using numpy and pandas so we should make use of the constructs available in them to avoid the Python interpreter overhead.
Additional context
I can submit a PR if this is acceptable.
The text was updated successfully, but these errors were encountered:
Is your feature request related to a problem? Please describe.
dowhy
is a great package, but the matching methods are unnecessarily slow for large datasets.Describe the solution you'd like
In propensity score matching and distance matching, the matching and ATT/ATC computation is done in a Python
for
loop which gets prohibitively time-consuming for large datasets (e.g., here). It's also unnecessary.For example, we can change this:
To something like this (this is untested, but something like this should work):
For datasets with >100k observations, this can provide a large speed-up.
Describe alternatives you've considered
Leaving it as is. There isn't much reason to leave this as is. We already are using
numpy
andpandas
so we should make use of the constructs available in them to avoid the Python interpreter overhead.Additional context
I can submit a PR if this is acceptable.
The text was updated successfully, but these errors were encountered: