Binary outcome, and multiple binary treatments. #228

ctrivino1 · 2024-01-29T02:38:29Z

ctrivino1
Jan 29, 2024

Hi all,

I am currently trying to find the ATE of three binary treatments on a binary outcome. Which model would I use to do this? I know the DoubleMLIRM model can use binary treatments, but I can only use one binary treatment in the DoubleMLData d_cols paramter. Does anyone have a solution or any suggestions? Thanks in advance!

Here is my code:
"""
import pandas as pd
import doubleml as dml
from xgboost import XGBClassifier
from sklearn.model_selection import train_test_split

' Load the dataset into a pandas dataframe'
df = pd.read_csv('/content/ai4i2020.csv')

'Assuming df is your DataFrame'
df['Machine failure'] = np.where((df['TWF'] == 1) | (df['HDF'] == 1) | (df['PWF'] == 1) | (df['OSF'] == 1) | (df['RNF'] == 1), 1, 0)

'Convert 'Machine failure' to integer'
df['Machine failure'] = df['Machine failure'].astype(int)

'One-hot encode categorical treatment variables'
df = pd.get_dummies(df, columns=['Type']).drop(['UDI', 'Product ID'], axis=1)

'Correct the data type assignments'
df['Type_H'] = df['Type_H'].astype(int)
df['Type_L'] = df['Type_L'].astype(int)
df['Type_M'] = df['Type_M'].astype(int)

'Define treatment, outcome, and covariates'
treatments = ['Type_M']
'##### I would like to use this: ['Type_M', 'Type_H', 'Type_L']#############'

outcome = 'Machine failure'
covariates = ['Air temperature [K]', 'Process temperature [K]', 'Rotational speed [rpm]', 'Torque [Nm]', 'Tool wear [min]']

'Split the data into training and testing sets'
train_df, test_df = train_test_split(df, test_size=0.2, random_state=42)

'Create a DoubleMLData object for training'
train_data_dml_base = dml.DoubleMLData(train_df, y_col=outcome, d_cols=treatments, x_cols=covariates)

'Boosted Trees for classification'
boost_class = XGBClassifier()

np.random.seed(123)
dml_irm = dml.DoubleMLIRM(train_data_dml_base, ml_g=boost_class, ml_m=boost_class)
dml_irm.fit(store_predictions=True)

irm_summary = dml_irm.summary

print(irm_summary)
"""

SvenKlaassen · 2024-01-29T08:40:33Z

SvenKlaassen
Jan 29, 2024
Maintainer

Hi,

the DoubleMLIRM Model is only available with a single treatment. For multiple treatments you have to fit several models seperately.
Is there a specific reason why you want to use multiple treatments at once (e.g. uniform confidence intervals)?

2 replies

ctrivino1 Feb 4, 2024
Author

Hi,

I just was checking to see if there was a model that could handle multiple treatment values that were binary. in my Use cases I am going to have binary, and continuous values. I'm not sure if there is a model that can handle both types of treatments.

Currently I have been using the 'DoubleMLPLR' model for my continuous data, but in all reality I'm not really sure which model I should be choosing because there are so many options. I'm not sure if you have a suggestions for a model to use with continuous treatments/binary treatments?

Would I need to create two separate DoubleMLPLR models where one contains the continuous treatments and the other contains the binary treatments? I am still relatively new to this so I'm not quite sure when I should choose certain models over others. Thanks!

SvenKlaassen Feb 6, 2024
Maintainer

Your are correct that the DoubleMLPLR odel can handle binary treatments, but you should use two seperate models if you want to use binary and continious treatments. The reason would be that the learner for ml_m is either used as a regressor or classifier (see DoubleMLPLR).
You can just define two DoubleMLData objects, where the first just uses the continuous treatments and add the binary treatments to the controls x. There you can use a regressor as learner for ml_m. Afterwards, use the binary treatments and instead add the continuous treatments to the controls. Then you should use a classifier as learner for ml_m.

For binary treatments, you could also use the DoubleMLIRM model, but there you would have to use a seperate model for each treatment (same procedure as above). The DoubleMLIRM model imposes a bit less structure (full interactions between treatment and confounders possible), whereas the DoubleMLPLR model assumes an additive effect.

I hope this can help with the decision.

ctrivino1 · 2024-02-08T18:18:48Z

ctrivino1
Feb 8, 2024
Author

Thank you Mr. Klaassen, I very much appreciate your input.

Currently I have created a function that uses your library to compute the ATE for various datasets, where the outcome and treatment variables could be continuous or binary. My function also includes a method to find the top n treatments if user is not aware of what may be an important factor to include in their treatment list. Would you like to inspect this functionality and potentially add this to your library? Thanks!

9 replies

SvenKlaassen Mar 17, 2024
Maintainer

Thank you. Looks very interesting.
I just think that this might be not always completely correct e.g. in the example https://docs.doubleml.org/dev/workflow/workflow.html#problem-formulation conditioning on participation might open a backdoor path if one would like to estimate the effect of income (conditioning on a collider). Therefore it might not always be the best option to exchange treatments $D$ and confounders $X$.

ctrivino1 Mar 17, 2024
Author

Hi,

I appreciate the feedback, I think I understand what you are saying. Do you have a suggestion on how I may be able to account for this? Do you think giving a user an option on what covariates to use in the model would help address this?

Would this project be something that you and your team may be interested in having some further discussion about?

ctrivino1 Mar 17, 2024
Author

As I have been incorporating your package with my work projects I have found some interesting insights and have been invited to a conference to present your method/package and my findings this June. Any help to make this as accurate as possible would be much appreciated.

SvenKlaassen Mar 17, 2024
Maintainer

I think the only option to account for this would be either to give the option on which covariates to use for each treatment or to give a DAG and infer the correponding adjustment sets for each "treatment".

Currently, our team is quite occupied, such that we do not have the capacity for such a project, but you can ask questions at any time and I (or some other team member) will try to answer them.
And any feedback for us is also helpful.

ctrivino1 Mar 17, 2024
Author

Thank you for your feedback it is much appreciated. I will be sure to look into how I can address this issue.

If I have any feedback I will be sure to send it your way.

Thank you for your time!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Binary outcome, and multiple binary treatments. #228

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Replies: 2 comments 11 replies

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

Select a reply

Binary outcome, and multiple binary treatments. #228

ctrivino1 Jan 29, 2024

Replies: 2 comments · 11 replies

SvenKlaassen Jan 29, 2024 Maintainer

ctrivino1 Feb 4, 2024 Author

SvenKlaassen Feb 6, 2024 Maintainer

ctrivino1 Feb 8, 2024 Author

SvenKlaassen Mar 17, 2024 Maintainer

ctrivino1 Mar 17, 2024 Author

ctrivino1 Mar 17, 2024 Author

SvenKlaassen Mar 17, 2024 Maintainer

ctrivino1 Mar 17, 2024 Author

ctrivino1
Jan 29, 2024

Replies: 2 comments 11 replies

SvenKlaassen
Jan 29, 2024
Maintainer

ctrivino1 Feb 4, 2024
Author

SvenKlaassen Feb 6, 2024
Maintainer

ctrivino1
Feb 8, 2024
Author

SvenKlaassen Mar 17, 2024
Maintainer

ctrivino1 Mar 17, 2024
Author

ctrivino1 Mar 17, 2024
Author

SvenKlaassen Mar 17, 2024
Maintainer

ctrivino1 Mar 17, 2024
Author