Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

rougeL returns 0 score on perfect prediction in some languages #441

Open
yoavkatz opened this issue Jan 1, 2024 · 1 comment
Open

rougeL returns 0 score on perfect prediction in some languages #441

yoavkatz opened this issue Jan 1, 2024 · 1 comment
Assignees

Comments

@yoavkatz
Copy link
Member

yoavkatz commented Jan 1, 2024

Change xlsum.py to run on all languages (remove if lang == langs[0]:)
Run python prepare/cards/xlsum.py

Traceback (most recent call last):
File "/home/runner/work/unitxt/unitxt/tests/test_preperation.py", line 47, in test_preprations
import_module_from_file(file)
File "/home/runner/work/unitxt/unitxt/tests/test_preperation.py", line 27, in import_module_from_file
spec.loader.exec_module(module)
File "", line 843, in exec_module
File "", line 219, in _call_with_frames_removed
File "/home/runner/work/unitxt/unitxt/prepare/cards/xlsum.py", line 42, in
test_card(card, debug=False)
File "/home/runner/work/unitxt/unitxt/src/unitxt/test_utils/card.py", line 238, in test_card
test_with_eval(
File "/home/runner/work/unitxt/unitxt/src/unitxt/test_utils/card.py", line 184, in test_with_eval
raise AssertionError(error_message)
AssertionError: The results of running the main metric in used in the card (rougeL) over simulated predictions that are equal to the references returns a different score than expected.
One would expect a perfect score of 1.0 in this case, but returned metric score was 0.0.
This usually indicates an error in the metric or post processors, but can be also an acceptable edge case.
In anycase, this requires a review. If this is acceptable, set strict=False in the call to test_card().
The predictions passed to the metrics were:
['በታይዋን ከአንዲት ሴት አይን ውስጥ ዶክተሮች አራት ንቦችን አወጡ። በደሴቲቱ እንዲህ አይነት ነገር ታይቶም ተሰምቶም አይታወቅም ሲሉ ተናግረዋል።', 'ከሰሞኑ ባለቤትነታቸው የአረና ትግራይ ፓርቲ አባል ናቸው የተባሉ አስራ ስድስት ፍየሎች የመታሰራቸው ዜና የማህበራዊ ሚዲያ ተጠቃሚዎች መነጋገሪያ ሆኖ ቆይቷል።', 'የአሜሪካው ፕሬዝደንት ዶናልድ ትራምፕ ቲክ ቶክ የተሰኘው የተንቀሳቃሽ ምስሎች መጋሪያ በአሜሪካ ድርጅት ካልተገዛ ሊያግዱት እንደሚችሉ አስጠንቅቀዋል።']

@dafnapension
Copy link
Collaborator

My 2 cents, having dug some:
Rouge employs its default tokenizer as first step to computing the score.
When lang = nepali, for example, no token identified, nor in prediction nor in target. Hence the score is 0 for all three examples.
When lang = marathi (!!) it gladly jumps on a '22' encountered in the string, and then target and prediction both are 'tokenized' to: ['22'], and this is a hit! being one of 3 examples, the final score is 0.3333 for this language.

I am not sure how to automatically recognize the language, or whether to use the known language in this case, and where to pull an adequate tokenizer from. whitespace tokenizer does some work, but this is just to prove a concept..

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants