Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Standard metrics #658

Closed
wants to merge 11 commits into from
Closed

Standard metrics #658

wants to merge 11 commits into from

Conversation

dafnapension
Copy link
Collaborator

No description provided.

@dafnapension dafnapension force-pushed the standard_metrics branch 2 times, most recently from 79f30cf to d877d7c Compare March 13, 2024 16:06
Copy link

codecov bot commented Mar 13, 2024

Codecov Report

Attention: Patch coverage is 59.91561% with 95 lines in your changes are missing coverage. Please review.

Project coverage is 89.05%. Comparing base (cdf0348) to head (91805ce).

Files Patch % Lines
src/unitxt/standard_metrics.py 58.03% 94 Missing ⚠️
src/unitxt/operators.py 88.88% 1 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main     #658      +/-   ##
==========================================
- Coverage   89.83%   89.05%   -0.78%     
==========================================
  Files          96       97       +1     
  Lines        9118     9350     +232     
==========================================
+ Hits         8191     8327     +136     
- Misses        927     1023      +96     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@yoavkatz
Copy link
Member

@dafnapension @elronbandel - Can you explain the motivation for this PR? What are standard metrics and how do they relate to the existing metrics?

@dafnapension
Copy link
Collaborator Author

Current evaluation of a global metric starts by laying the whole stream in main memory, adding "next to it" a couple of hundreds copies thereof (for the re_samplings).
This breaks the 'streaming' spirit of unitxt.
We are trying to see if we can stream also the global metrics .
To this end, we implement the following for each global metric:
(1) an instance scorer to score each individual instance (like today's)
(2) an accumulator that accumulates what it needs from each instance, but not copying the whole instance. E.g. F1 accumulates the confusion matrix (a count of occurrences of each pair of (ref, pred)) over all the instances. This counter is expected to be dramatically smaller than the size of the whole evaluated stream.
(3) a function yielding the final global score from the accumulated value.

The resampling is somewhat more trickier:
Today, we generate a single resample by selecting, with replacements, n instances from a stream of length n. We repeat that process the number of resamples we want to use.
This process does not suit streaming. So we suggest:
Given an instance i, for each resample r (that we want to learn from without first building it) we randomly pick the number of times b that i is to participate in r.
poisson distribution for picking b, is exactly what we need here, being a close approximation of the binomial distribution that is induced by the usual selection with replacement.

@yoavkatz
Copy link
Member

yoavkatz commented Mar 15, 2024

@elronbandel @dafnapension - I'm sure you discussed it between you alot, but I want to provide a different perspective.

I think stream in unitxt may be useful if unitxt used for large scale training - however, it also has significant cost in terms of code and API complexity. In evaluation , where typically only hundres of samples are tested, streaming will have no significant value.

We need metrics API that are
(1) independent from each other
(2) easy for users to add AND debug them.

Our direction should be of simplification and not making things more complex.

Therefore, I think it's worth to have a discussion if this direction will have a net gain in terms of unitxt acceptance.

(@eladven - will be glad your input as well).

Signed-off-by: dafnapension <dafnashein@yahoo.com>
Signed-off-by: dafnapension <dafnashein@yahoo.com>
Signed-off-by: dafnapension <dafnashein@yahoo.com>
…t. as in global classification metrics in metrics.py

Signed-off-by: dafnapension <dafnashein@yahoo.com>
Signed-off-by: dafnapension <dafnashein@yahoo.com>
Signed-off-by: dafnapension <dafnashein@yahoo.com>
Signed-off-by: dafnapension <dafnashein@yahoo.com>
Signed-off-by: dafnapension <dafnashein@yahoo.com>
Signed-off-by: dafnapension <dafnashein@yahoo.com>
Signed-off-by: dafnapension <dafnashein@yahoo.com>
Signed-off-by: dafnapension <dafnashein@yahoo.com>
@dafnapension
Copy link
Collaborator Author

dafnapension commented May 20, 2024

Leave for now. If at all, continue via #845

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants