Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Predefined tests expect the target variable to be named 'target' regardless of the predefined ColumnMapping #1097

Open
Borka95 opened this issue May 6, 2024 · 2 comments

Comments

@Borka95
Copy link

Borka95 commented May 6, 2024

I encountered two issues while using the EvidentlyAI package.

  1. Predefined tests expect the target variable to be named 'target':
    Some of the predefined tests in the package expect the target variable in the dataset to be named 'target'. Although it is possible to specify a custom column mapping to define the target variable, some tests in the package only work if the target variable is actually named 'target'. This requires renaming the column before analysis to successfully execute the test. Additionally, the PredictionColumns must be named exactly 'prediction'. In my case, the columns were named 'prediction' and 'Regression', and the error message was: 'target' and 'prediction' columns must be defined in the dataset, despite a functioning column mapping and successful reports.

  2. Insufficient explanation when an exception occurs:
    Occasionally, an exception occurs without sufficient explanation. Instead of an informative error message, only a portion of the internal code and an exception are provided, which does not clearly explain the reason for the error. This complicates troubleshooting and understanding of the problem significantly.

Steps to reproduce:

  1. Use the EvidentlyAI package with a dataset where the target variable is not named 'target'.
  2. Execute the predefined tests in the package.
  3. The code does not work because the target variable is not named 'target'.

Expected behavior:
The predefined tests in the EvidentlyAI package should be able to run successfully regardless of the naming of the target variable. An elaborate and understandable error message should be provided in case an exception occurs to facilitate troubleshooting.

Additional information:
Package version used: [latest]
Operating system and version: [Windows 10 and VS Code]
Programming language and version used: [Python 3.9]

@elenasamuylova
Copy link
Collaborator

Hi @Borka95,

Could you share:

  • Which exact Tests / Test Suites are you using where you face the error?
  • The code where you perform column mapping?

@Borka95
Copy link
Author

Borka95 commented May 7, 2024

Hi @elenasamuylova and thank you for your quick answer :)

I have generated a synthetic dataset. To avoid typos, I am using an enum. I have revised the code in the meantime. Unfortunately, I no longer have the state of the code as it was when I wrote the issue. Therefore, I have compiled all the tests I have performed, as well as the reports created. It was necessary to rename the "Regression" column, and "Target" with a capital letter was also not accepted. The reports worked with this ColumnMapping, but not all tests did, and I don't know which ones.

class COLUMNS(Enum):
Timestamp = 'Timestamp'
Temperature = 'Temperature'
Humidity = 'Humidity'
Pressure = 'Pressure'
Speed = 'Speed'
Classification = 'Classification'
Regression = 'Regression'
Prediction = 'prediction'
Target = 'target'

column_mapping = ColumnMapping()
column_mapping.target = COLUMNS.Regression.value
column_mapping.prediction = COLUMNS.Prediction.value
column_mapping.numerical_features = features
column_mapping.categorical_features = []

Report(metrics=[ClassificationPreset()])
Report(metrics=[RegressionPreset()])
Report(metrics = [DataDriftPreset(stattest_threshold=threshold)])
Report(metrics=[
RegressionErrorBiasTable(columns=[COLUMNS.Temperature.value, COLUMNS.Humidity.value]),
RegressionDummyMetric()
])

tests = TestSuite(tests=[
TestAccuracyScore(),
TestPrecisionScore(),
TestRecallScore(),
TestF1Score(),
TestColumnDrift(column_name=COLUMNS.Temperature.value, stattest_threshold=threshold),
TestNumberOfRows(),
TestPrecisionByClass(label=1),
TestRecallByClass(label=1),
TestF1ByClass(label=1),
TestColumnDrift(column_name=COLUMNS.Humidity.value, stattest_threshold=threshold),
TestConflictTarget(),
TestConflictPrediction(),
TestTargetPredictionCorrelation(),
TestHighlyCorrelatedColumns(),
TestTargetFeaturesCorrelations(),
TestPredictionFeaturesCorrelations(),
TestCorrelationChanges(),
TestNumberOfDriftedColumns(stattest_threshold=threshold),
TestShareOfDriftedColumns(stattest_threshold=threshold),
TestColumnDrift(column_name=COLUMNS.Temperature.value, stattest='psi', stattest_threshold=threshold),
])

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants