Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Data quality test suite saved as HTML is much bigger than data quality preset metric report (300MB vs. 3MB) #1092

Open
billlyzhaoyh opened this issue May 2, 2024 · 3 comments

Comments

@billlyzhaoyh
Copy link

billlyzhaoyh commented May 2, 2024

The two files in the screenshot are generated with the code below:

print("Generating data quality report...")
data_quality_report = Report(metrics=[
    DataQualityPreset(),
])
data_quality_report.run(reference_data=df, current_data=df, column_mapping=data_column_mapping)
data_quality_report.save_html(
    os.path.join(data_profile_dir, "data_quality.html")
)
print("Data quality report generated successfully!")
print("Running data quality test suite...")
data_quality_test_suite = TestSuite(tests=[
    DataDriftTestPreset(),
    DataQualityTestPreset(),
    DataStabilityTestPreset(),
])
data_quality_test_suite.run(reference_data=df, current_data=df, column_mapping=data_column_mapping)
data_quality_test_suite.save_html(
    os.path.join(data_profile_dir, "data_quality_test.html")
)
print("Data quality test suite generated successfully!")
Screenshot 2024-05-02 at 17 15 05

What can I do to shrink the size of the HTML output from the test suite?

@elenasamuylova
Copy link
Collaborator

Hi @billlyzhaoyh,

In the second instance (when you combine multiple Test Presets), you generate a very large number of column-level tests, compared to the first instance (where DataQualityPreset() generates summaries for all columns only once).

Many of these individual Tests have a visual render (e.g., distribution of each column), increasing the resulting HTML's size.

The solution is to create a custom Test Suite that includes the individual Tests you'd like to see, instead of combining Test Presets.
https://docs.evidentlyai.com/user-guide/tests-and-reports/custom-test-suite

@billlyzhaoyh
Copy link
Author

Thank you for this @elenasamuylova I was trying to look up but is there any way that we can disable visual render functionality in favour of a smaller HTML?

@elenasamuylova
Copy link
Collaborator

Hi @billlyzhaoyh, I am afraid there is no such feature currently. However, you can export the results as a JSON or Python dictionary instead: https://docs.evidentlyai.com/user-guide/tests-and-reports/run-tests#output-formats

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants