Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bug: Cannot save html classification report when target column and possible labels do not match. #1070

Open
lorenzomassimiani opened this issue Apr 11, 2024 · 2 comments

Comments

@lorenzomassimiani
Copy link

lorenzomassimiani commented Apr 11, 2024

With this csv:

target cat dog giraffe
cat 0.8 0.1 0.1
dog 0.3 0.3 0.4

when i build the multiclass classification using:

import pandas as pd
from evidently import ColumnMapping
from evidently.metric_preset import ClassificationPreset
from evidently.report import Report 

df = pd.read_csv("animals.csv")
column_mapping = ColumnMapping()
column_mapping.target = "target"
column_mapping.prediction = list(df.loc[:, df.columns != "target"])
classification_performance_report = Report(metrics=[ClassificationPreset()])
classification_performance_report.run(current_data=df, reference_data=None, column_mapping=column_mapping)

I got, correctly, some warning of this kind:

UndefinedMetricWarning: Precision is ill-defined and being set to 0.0 in labels with no predicted samples. Use `zero_division` parameter to control this behavior.

then the report is generated correctly, but when i try to save it in html format for using it in my streamlit app:

classification_performance_report.save_html("report.html")

i get this error:

Traceback (most recent call last):
  File "/evidently-report/utils/csv2report.py", line 32, in <module>
    classification_performance_report.save_html(report_filepath)
  File "/evidently-report/venv/lib/python3.11/site-packages/evidently/suite/base_suite.py", line 207, in save_html
    dashboard_id, dashboard_info, graphs = self._build_dashboard_info()
                                           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/evidently-report/venv/lib/python3.11/site-packages/evidently/report/report.py", line 212, in _build_dashboard_info
    html_info = renderer.render_html(test)
                ^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/evidently-report/venv/lib/python3.11/site-packages/evidently/metrics/classification_performance/classification_quality_metric.py", line 73, in render_html
    metric_result = obj.get_result()
                    ^^^^^^^^^^^^^^^^
  File "/evidently-report/venv/lib/python3.11/site-packages/evidently/base_metric.py", line 232, in get_result
    raise result.exception
  File "/evidently-report/venv/lib/python3.11/site-packages/evidently/calculation_engine/engine.py", line 42, in execute_metrics
    calculations[metric] = calculation.calculate(context, converted_data)
                           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/evidently-report/venv/lib/python3.11/site-packages/evidently/calculation_engine/python_engine.py", line 88, in calculate
    return self.metric.calculate(data)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/evidently-report/venv/lib/python3.11/site-packages/evidently/metrics/classification_performance/classification_quality_metric.py", line 45, in calculate
    current = calculate_metrics(
              ^^^^^^^^^^^^^^^^^^
  File "/evidently-report/venv/lib/python3.11/site-packages/evidently/calculations/classification_performance.py", line 382, in calculate_metrics
    roc_auc = metrics.roc_auc_score(binaraized_target, prediction_probas_array, average="macro")
              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/evidently-report/venv/lib/python3.11/site-packages/sklearn/metrics/_ranking.py", line 580, in roc_auc_score
    return _average_binary_score(
           ^^^^^^^^^^^^^^^^^^^^^^
  File "/evidently-report/venv/lib/python3.11/site-packages/sklearn/metrics/_base.py", line 118, in _average_binary_score
    score[c] = binary_metric(y_true_c, y_score_c, sample_weight=score_weight)
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/evidently-report/venv/lib/python3.11/site-packages/sklearn/metrics/_ranking.py", line 339, in _binary_roc_auc_score
    raise ValueError(
ValueError: Only one class present in y_true. ROC AUC score is not defined in that case.

It would be better if this scenario was handled by setting ROC AUC score for that class equal to 0 (or 1).

@EgonFerri
Copy link

I have the same problem. I did a bit of investigating, thinking we could work together to resolve the issue, but the solution to this problem did not seem very straightforward to me.

The main issue is that, when some labels are missing from the set of predictions in the .csv file, certain metrics become meaningless, and when evidently (via sklearn) tries to calculate them using scikit-learn results in errors.

Philosophically, I believe that this shouldn't happen because, even if according to the law of large numbers, it should be very rare for labels to be missing from large samples, a tool like Evidently should be capable of handling scenarios with missing labels, which can occur quite frequently, both in testing/debugging scenarios and in standard tasks where it is common for a label to be significantly less prevalent (e.g., spam detection, anomaly detection, forgery detection).

Practically speaking, fixing this is not trivial. Ideally, the report should be generated without omitting plots where metric calculations fail. Instead, these plots should include placeholders for the missing labels. However, this is not easy to achieve, since the code heavily relies on scikit-learn's abstractions. Should we request Scikit-learn to modify the ROC AUC function to accommodate absent labels in predictions? This approach seems incorrect because the statistic itself becomes irrelevant from a statistical perspective. Therefore, the solution should come from a higher level, although integrating such a change elegantly with Evidently's use of Scikit-learn is challenging if it is the best approach at all.

We could force the set of labels to contain all of them, or put dummy data, and although this should work, is not a definitive solution.

I'd like to help, but I'm not sure on were to start. @emeli-dral, @mike0sv what do you think? Thanks in advance and great work on this project

@EgonFerri
Copy link

Sorry @elenasamuylova, could we get an opinion on this? :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants