Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

evidently slow in docker #1054

Open
ankita2020 opened this issue Apr 3, 2024 · 2 comments
Open

evidently slow in docker #1054

ankita2020 opened this issue Apr 3, 2024 · 2 comments

Comments

@ankita2020
Copy link

when running in docker it will take more time and running locally, it will create all charts in less time.
total rows in prediction data - 4000
total rows in reference data 20000
numerical method-ks test, numerical thres-0.6
cat method -chi-square,categorical thres-0.4

and create 2 charts(data drift,correlation chart) I use a function

def detect_features_drift(reference, production, column_mapping, get_scores=False):
    """
    Returns True if Data Drift is detected, else returns False. 
    If get_scores is True, returns scores value (like p-value) for each feature.
    The Data Drift detection depends on the confidence level and the threshold.
    For each individual feature Data Drift is detected with the selected confidence (default value is 0.95).
    """

    data_drift_report = Report(metrics=[DataDriftPreset()])
    # print(production.head())
    data_drift_report.run(reference_data=reference, current_data=production, column_mapping=column_mapping)
    report = data_drift_report.as_dict()

    drifts = []
    num_features = column_mapping.numerical_features if column_mapping.numerical_features else []
    cat_features = column_mapping.categorical_features if column_mapping.categorical_features else []
    # print(production.columns.tolist(),'current')
    # print(report["metrics"][1]["result"]["drift_by_columns"],'report')
    for feature in num_features + cat_features:
        drift_score = report["metrics"][1]["result"]["drift_by_columns"][feature]["drift_score"]
        if get_scores:
            drifts.append((feature, drift_score))
        else:
            drifts.append((feature, report["metrics"][1]["result"]["drift_by_columns"][feature]["drift_detected"]))

    return drifts 

in my code and that function is taking lot of time .

python version 3.8

libraries-

asynch==0.2.3
certifi==2022.12.7
charset-normalizer==3.1.0
ciso8601==2.3.1
click==8.1.3
clickhouse-cityhash==1.0.2.4
clickhouse-driver==0.2.6
clickhouse-sqlalchemy==0.2.5
dnspython==2.3.0
email-validator==1.3.1
evidently==0.2.7
fastapi==0.95.0
greenlet==2.0.2
gunicorn==21.2.0
h11==0.14.0
idna==3.4
joblib==1.2.0
leb128==1.0.5
lz4==4.3.3
nltk==3.8.1
numpy==1.24.2
packaging==23.0
pandas==1.5.3
pandasql==0.7.3
patsy==0.5.3
plotly==5.14.0
psycopg2==2.9.9
pydantic==1.10.7
pyspark==3.5.0
python-dateutil==2.8.2
python-multipart==0.0.6
pytz==2023.3
PyYAML==6.0
regex==2023.3.23
requests==2.28.2
scikit-learn==1.2.2
scipy==1.10.1
six==1.16.0
sniffio==1.3.0
SQLAlchemy==1.4.52
starlette==0.26.1
statsmodels==0.13.5
tenacity==8.2.2
threadpoolctl==3.1.0
tqdm==4.65.0
typing_extensions==4.9.0
tzlocal==2.1
urllib3==1.26.15
uvicorn==0.21.1
zstd==1.5.5.1
@elenasamuylova
Copy link
Collaborator

Hi @ankita2020 - you seem to be using Evidently version 0.2.7 . The latest version is 0.4.18, and it includes various improvements, including those speeding up drift calculations. We recommend upgrading to this version - let us know if you observe any issues with it.

@ankita2020
Copy link
Author

ankita2020 commented Apr 3, 2024

Hi @elenasamuylova - I used the same version locally, and it still takes less time to generate all the charts. However, running the code in Docker with the same version takes approximately three times longer. If the version were the issue, it should take more time on my local machine as well. Are there any Ubuntu-specific versions required in Docker, or do you have any other suggestions? Ubuntu local version 22.04 and in docker ubuntu version 18

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants