Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Issue in setter of y_col properties for objects of class DoubleMLData: Objects mutate despite the fact that a ValueError was raised #144

Open
MalteKurz opened this issue Mar 23, 2022 · 0 comments
Assignees
Labels
bug Something isn't working exception handling

Comments

@MalteKurz
Copy link
Member

Describe the bug

Bug reported by @ShreyDixit:

Assume that one successfully initializes an object of class DoubleMLData. Then alters a property like y_col in a way that violates some basic assumptions (e.g., the same variable cannot be at the same time the outcome variable y_col and the treatment variable d_cols). This results in a ValueError being raised. However, nevertheless the object mutates and violates the basic assumption.

--> So while the ValueError is appropriately raised, the object nevertheless mutates and the y_col property is changed. The root cause is in the setter for the y_col property

@y_col.setter
def y_col(self, value):
reset_value = hasattr(self, '_y_col')
if not isinstance(value, str):
raise TypeError('The outcome variable y_col must be of str type. '
f'{str(value)} of type {str(type(value))} was passed.')
if value not in self.all_variables:
raise ValueError('Invalid outcome variable y_col. '
f'{value} is no data column.')
self._y_col = value
if reset_value:
self._check_disjoint_sets()
self._set_y_z()

Basically the value shouldn't be set before all checks have been successfully applied. However, in its current form the _check_disjoint_sets() check requires that the properties have been set already. The same issue also applies to the other setters for properties like d_cols, x_cols, etc. Note however, that this issue only becomes relevant if an object of class DoubleMLData has been initialized successfully and if then the user alters one of the properties in a way that violates _check_disjoint_sets().

Minimum reproducible code snippet

Code block 1

from doubleml.datasets import make_plr_CCDDHNR2018
dml_data = make_plr_CCDDHNR2018()
print(dml_data.y_col)
dml_data.y_col = 'd'

Code block 2

print(dml_data.y_col)

Expected Result

First code block: dml_data.y_col == 'y' and raise exception

ValueError: d cannot be set as outcome variable ``y_col`` and treatment variable in ``d_cols``.

Second code block: dml_data.y_col == 'y' should still hold.

Actual Result

First code block: dml_data.y_col == 'y' and raise exception

ValueError: d cannot be set as outcome variable ``y_col`` and treatment variable in ``d_cols``.

Second code block: dml_data.y_col == 'd'

Versions

Python 3.9.7
DoubleML 0.4.1
Scikit-Learn 1.0.1

@MalteKurz MalteKurz added the bug Something isn't working label Mar 23, 2022
@MalteKurz MalteKurz self-assigned this Mar 23, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working exception handling
Projects
None yet
Development

No branches or pull requests

1 participant