Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Custom StatTest Failing in Python Due to missing Python Engine #1064

Open
rygeorge3 opened this issue Apr 8, 2024 · 1 comment
Open

Custom StatTest Failing in Python Due to missing Python Engine #1064

rygeorge3 opened this issue Apr 8, 2024 · 1 comment

Comments

@rygeorge3
Copy link

When attempting to implement a custom StatTest in python, the run function is failing with the following error:

  • Test failed with exceptions: 'mann-whitney-u' is not implemented for <class 'evidently.calculation_engine.python_engine.PythonEngine'>

See below for the code to reproduce the error.

import pandas as pd
import numpy as np

from scipy.stats import mannwhitneyu
from sklearn import datasets

from evidently.calculations.stattests import StatTest
from evidently.test_suite import TestSuite
from evidently.tests import *

#Dataset for Data Quality and Integrity
adult_data = datasets.fetch_openml(name='adult', version=2, as_frame='auto')
adult = adult_data.frame

adult_ref = adult[~adult.education.isin(['Some-college', 'HS-grad', 'Bachelors'])]
adult_cur = adult[adult.education.isin(['Some-college', 'HS-grad', 'Bachelors'])]

adult_cur.iloc[:2000, 3:5] = np.nan

def _mann_whitney_u(reference_data: pd.Series, current_data: pd.Series, _feature_type: str, threshold: float):
p_value = mannwhitneyu(np.array(reference_data), np.array(current_data))[1]
return p_value, p_value < threshold

mann_whitney_stat_test = StatTest(
name="mann-whitney-u",
display_name="mann-whitney-u test",
func=_mann_whitney_u,
allowed_feature_types=["num"]
)

data_drift_dataset_tests = TestSuite(tests=[
TestShareOfDriftedColumns(num_stattest=mann_whitney_stat_test),
])

data_drift_dataset_tests.run(reference_data=adult_ref, current_data=adult_cur)
data_drift_dataset_tests

@Nakulbajaj101
Copy link

I tried with 0.4.2 and it worked. On 0.4.7 there were some other issues, and 0.4.0 was missing some tests. They definitely stuffed something up with Python engine , and since they made func a property

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants