-
Notifications
You must be signed in to change notification settings - Fork 683
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
DynamicDML() issue: AttributeError: Provided crossfit folds contain training splits that don't contain all treatments DynamicDML #859
Comments
Was this solved? I'm having the same area using dowhy.fit. |
No, I have not heard back yet |
Realized my instrument variable wasn't binary and had to be. Using Econml and DoWhy in tandem from the sample notebooks online for EconMl. |
Sorry for the slow response - a couple of thoughts:
|
Hi @kbattocchi and @samanbanafti I am just wondering how can this issue be solved? Because I encountered the same problem when I am using Causal Forest DML with dowhy fit and set discrete treatment to be True for the treatment. My treatment is a categorical variable with category type, it has values such as "High Impact", "Medium Impact" and "Low Impact" etc. It was working when I use the model on a continuous treatment variable except it is not RandomForestClassifier and discrete treatment is False. Code:
Error:
|
Hello,
When calling
DynamicDML()
as such:Here
Y
,T
,X
andgroups
are in long format and have the following shapes:((32382,), (32382,), (32382, 8)), (32382,)
where
n=N*Time=32,382; N=1542
cross-sectional units andTime=21
months andgroups
hasN
distinct ids corresponding to the distinct cross-sectional units.I have already balanced the panel.
T
is a binary and discrete treatment and I see the default value fordiscrete_treatment
isFalse
, when setting instantiating withdiscrete_treatment=True
I get:AttributeError: Provided crossfit folds contain training splits that don't contain all treatments
arising from
and it appears
Target
is a 1-hot encoding ofT
; if so, then this condition:(not np.any(np.all(pd.get_dummies(T,dtype=int) == 0, axis=1)))
isTrue
leading to the Attribute error. The way I am codingT
is for each cross-sectional unit & month observationT=0
if that unit is not treated yet andT=1
once they become treated and remains 1; while controls haveT=0
for all months. I imagine this is fine?I'm using
RandomForestClassifier
formodel_t
andGradientBoostingRegressor
formodel_y
.The correct instantiation would be the one with
discrete_treatment=True
so that is the error I am more concerned about, just providing full context.I get the following error (with discrete treatment is False):
Co-variance matrix is underdetermined. Inference will be invalid!
this holds with or without the inclusion of
X
, which has been standardized such that features have zero mean and unit variance.Thanks,
Saman
The text was updated successfully, but these errors were encountered: