[FEAT] Support NLP text regression #2625

j-adamczyk · 2023-07-09T13:15:42Z

Is your feature request related to a problem? Please describe.

Currently only token classification and text classification are supported for NLP. However, there are important cases for text regression, for example:

CTR prediction for advertisements
sentiment magnitude prediction, e.g. GCP sentiment analysis predicts continous values instead of classes
ordinal regression for texts, e.g. predicting number of stars from 1 to 5 based on review text

Describe the solution you'd like

Support for text regression, similar to tabular regression, but for NLP models, e.g. checking regression error distribution or train-test degradation for regression metrics.

noamzbr · 2023-07-21T09:22:45Z

Thanks for the suggestion @j-adamczyk! Any other features you'd suggest for these task types?

j-adamczyk · 2023-07-29T17:51:43Z

@noamzbr thank you for fast response.

This requires a mix of regression tests and NLP tests.

From tabular quickstart, interesting checks are:

train-test performance
regression error distribution
prediction drift
simple model comparison

Specifically, tabular regression checks that don't make sense for NLP are weak segments performance (since they are not well defined for NLP), boosting overfit (since NLP does not use boosting typically) and model inference time (which is naturally long for NLP).

From [NLP text classification quickstart]:

text property outliers
unknown tokens
under annotated property segments
under annotated metadata segments
text duplicates
special characters

Also, image regression could also be added in a very similar way (but that is outside the scope of this issue).

j-adamczyk · 2023-10-11T20:10:00Z

@noamzbr any news on this? As far as I understand, this is mixing 2 existing things together, and no really new code is needed

github-actions bot added needs triage Issue needs to be labeled and prioritized linear labels Jul 9, 2023

noamzbr added nlp Affects deepchecks.nlp package ds Tasks suited for Data Scientists and removed needs triage Issue needs to be labeled and prioritized labels Jul 19, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[FEAT] Support NLP text regression #2625

[FEAT] Support NLP text regression #2625

j-adamczyk commented Jul 9, 2023

noamzbr commented Jul 21, 2023

j-adamczyk commented Jul 29, 2023

j-adamczyk commented Oct 11, 2023

[FEAT] Support NLP text regression #2625

[FEAT] Support NLP text regression #2625

Comments

j-adamczyk commented Jul 9, 2023

noamzbr commented Jul 21, 2023

j-adamczyk commented Jul 29, 2023

j-adamczyk commented Oct 11, 2023