Standard metrics #658

dafnapension · 2024-03-13T16:01:14Z

No description provided.

codecov · 2024-03-13T18:54:40Z

Codecov Report

Attention: Patch coverage is 59.91561% with 95 lines in your changes are missing coverage. Please review.

Project coverage is 89.05%. Comparing base (cdf0348) to head (91805ce).

Files	Patch %	Lines
src/unitxt/standard_metrics.py	58.03%	94 Missing ⚠️
src/unitxt/operators.py	88.88%	1 Missing ⚠️

Additional details and impacted files

@@            Coverage Diff             @@
##             main     #658      +/-   ##
==========================================
- Coverage   89.83%   89.05%   -0.78%     
==========================================
  Files          96       97       +1     
  Lines        9118     9350     +232     
==========================================
+ Hits         8191     8327     +136     
- Misses        927     1023      +96

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

yoavkatz · 2024-03-14T22:32:35Z

@dafnapension @elronbandel - Can you explain the motivation for this PR? What are standard metrics and how do they relate to the existing metrics?

dafnapension · 2024-03-14T23:02:15Z

Current evaluation of a global metric starts by laying the whole stream in main memory, adding "next to it" a couple of hundreds copies thereof (for the re_samplings).
This breaks the 'streaming' spirit of unitxt.
We are trying to see if we can stream also the global metrics .
To this end, we implement the following for each global metric:
(1) an instance scorer to score each individual instance (like today's)
(2) an accumulator that accumulates what it needs from each instance, but not copying the whole instance. E.g. F1 accumulates the confusion matrix (a count of occurrences of each pair of (ref, pred)) over all the instances. This counter is expected to be dramatically smaller than the size of the whole evaluated stream.
(3) a function yielding the final global score from the accumulated value.

The resampling is somewhat more trickier:
Today, we generate a single resample by selecting, with replacements, n instances from a stream of length n. We repeat that process the number of resamples we want to use.
This process does not suit streaming. So we suggest:
Given an instance i, for each resample r (that we want to learn from without first building it) we randomly pick the number of times b that i is to participate in r.
poisson distribution for picking b, is exactly what we need here, being a close approximation of the binomial distribution that is induced by the usual selection with replacement.

yoavkatz · 2024-03-15T13:22:51Z

@elronbandel @dafnapension - I'm sure you discussed it between you alot, but I want to provide a different perspective.

I think stream in unitxt may be useful if unitxt used for large scale training - however, it also has significant cost in terms of code and API complexity. In evaluation , where typically only hundres of samples are tested, streaming will have no significant value.

We need metrics API that are
(1) independent from each other
(2) easy for users to add AND debug them.

Our direction should be of simplification and not making things more complex.

Therefore, I think it's worth to have a discussion if this direction will have a net gain in terms of unitxt acceptance.

(@eladven - will be glad your input as well).

Signed-off-by: dafnapension <dafnashein@yahoo.com>

…t. as in global classification metrics in metrics.py Signed-off-by: dafnapension <dafnashein@yahoo.com>

Signed-off-by: dafnapension <dafnashein@yahoo.com>

dafnapension · 2024-05-20T16:54:54Z

Leave for now. If at all, continue via #845

dafnapension force-pushed the standard_metrics branch 2 times, most recently from 79f30cf to d877d7c Compare March 13, 2024 16:06

dafnapension force-pushed the standard_metrics branch from 0f69426 to 9271fcc Compare March 13, 2024 22:21

dafnapension mentioned this pull request Mar 14, 2024

thin global metrics evaluators and CI calculators #648

Closed

dafnapension force-pushed the standard_metrics branch from 9271fcc to ef62ff5 Compare March 14, 2024 11:29

dafnapension force-pushed the standard_metrics branch from 46ecaef to a90e3aa Compare March 14, 2024 22:36

dafnapension force-pushed the standard_metrics branch from 3f79003 to bc6a4e7 Compare March 17, 2024 12:33

dafnapension added 11 commits March 21, 2024 00:01

thin global metrics evaluators and CI calculators

7889bc4

Signed-off-by: dafnapension <dafnashein@yahoo.com>

thin global evaluation - the object oriented version

97e6dbc

Signed-off-by: dafnapension <dafnashein@yahoo.com>

add a couple of multi_label thin evaluators

643c6f3

Signed-off-by: dafnapension <dafnashein@yahoo.com>

if more than one reference in classification, ignore all but the firs…

d76f9f5

…t. as in global classification metrics in metrics.py Signed-off-by: dafnapension <dafnashein@yahoo.com>

fixed returned metric name

d8f620d

Signed-off-by: dafnapension <dafnashein@yahoo.com>

more in the spirit of unitxt

f852708

Signed-off-by: dafnapension <dafnashein@yahoo.com>

fixed typo

2f57ce1

Signed-off-by: dafnapension <dafnashein@yahoo.com>

add matthews_correlation

024fd7d

Signed-off-by: dafnapension <dafnashein@yahoo.com>

cleaned some

aa5b69d

Signed-off-by: dafnapension <dafnashein@yahoo.com>

and cleaned some more

bd07ffd

Signed-off-by: dafnapension <dafnashein@yahoo.com>

polished some

91805ce

Signed-off-by: dafnapension <dafnashein@yahoo.com>

dafnapension force-pushed the standard_metrics branch from bc6a4e7 to 91805ce Compare March 20, 2024 22:01

dafnapension closed this May 20, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Standard metrics #658

Standard metrics #658

dafnapension commented Mar 13, 2024

codecov bot commented Mar 13, 2024 •

edited

yoavkatz commented Mar 14, 2024

dafnapension commented Mar 14, 2024

yoavkatz commented Mar 15, 2024 •

edited

dafnapension commented May 20, 2024 •

edited

Standard metrics #658

Standard metrics #658

Conversation

dafnapension commented Mar 13, 2024

codecov bot commented Mar 13, 2024 • edited

Codecov Report

yoavkatz commented Mar 14, 2024

dafnapension commented Mar 14, 2024

yoavkatz commented Mar 15, 2024 • edited

dafnapension commented May 20, 2024 • edited

codecov bot commented Mar 13, 2024 •

edited

yoavkatz commented Mar 15, 2024 •

edited

dafnapension commented May 20, 2024 •

edited