Addition of : Thai, Romanian, Hebrew, Korean, Burmese, Nigerian (Multilingual) Datasets #724

Art3mis0707 · 2024-05-15T07:00:53Z

Checklist for adding MMTEB dataset

Addition of :

Thai Restaurant Reviews Dataset - WongnaiReviewsClassification
Hebrew Sentiment Analysis Dataset - HebrewSentimentAnalysis
Korean Financial Sentiment Analysis Dataset - KorFin
Burmese News Classification Dataset - MyanmarNews
Romanian News Classification Dataset - Moroco
Nigerian (Multilingual) Twitter Sentiment Analysis Dataset - NaijaSenti

Reason for dataset addition:
All the datasets (except Korean) are low resource datasets, and were present mostly only in the multilingual datasets. Having monolingual datasets in low resource languages will enrich the diversity of the benchmark.

imenelydiaker

Thanks for this addition! Below my comments.
Also please rename files with underscores by removing it (e.g., Thai_Restaurant_Reviews.json -> ThaiRestaurantReviews.json)

mteb/tasks/Classification/heb/Hebrew_Sentiment_Analysis.py

mteb/tasks/Classification/tha/Thai_Restaurant_Reviews.py

mteb/tasks/Classification/ron/Moroco.py

mteb/tasks/Classification/tha/ThaiRestaurantReviews.py

…ian Languages

KranthiGV · 2024-05-15T14:25:29Z

(In the interest of quick merging, I'd recommend not adding a new dataset after significant reviewing is done in the same PR.
Would be better to start a new PR)

Art3mis0707 · 2024-05-15T16:27:28Z

I've made all changes @KranthiGV @imenelydiaker
Please let me know if there's anything else I need to do
Thanks!

mteb/tasks/Classification/ron/Moroco.py

Art3mis0707 · 2024-05-15T17:57:26Z

I have also created a jsonl file and added points...2 for @imenelydiaker and @KranthiGV for reviewing and 12 for myself (6x2).

imenelydiaker

LGTM! Can you please add you name and affiliation here and run linting?

mteb/tasks/Classification/multilingual/IndicSentimentClassification.py

…ation.py Co-authored-by: Imene Kerboua <33312980+imenelydiaker@users.noreply.github.com>

Art3mis0707 · 2024-05-16T12:42:04Z

LGTM! Can you please add you name and affiliation here and run linting?

I've done it.
Please check, thanks

imenelydiaker

LGTM, thank you for your contribution! Let's merge 🙂

Art3mis0707 and others added 9 commits May 14, 2024 19:11

Added a new dataset for the Burmese Language and ran the required tests

ac2914c

Merge branch 'main' into main

707ce7a

Moved from classifications to clustering

3a53821

Adding a Hebrew Classification Dataset

231a4a7

Added a financial sentiment analysis dataset in Korean

ff9c5c9

Added a Korean Sentiment Analysis Dataset

2b5128b

Added a Burmese News Classification Dataset

b76c14d

Added a Hebrew Sentiment Analysis Dataset

55fed75

Added a Thai dataset on restaurant reviews

ebaa41c

imenelydiaker reviewed May 15, 2024

View reviewed changes

imenelydiaker self-assigned this May 15, 2024

This was referenced May 15, 2024

Hebrew Sentiment Analysis Dataset #722

Closed

Addition of a Korean Sentiment Analysis Dataset #719

Closed

Art3mis0707 and others added 4 commits May 15, 2024 14:26

Made changes to the thai dataset and hebrew dataset

b9c4ec1

Made changes to all 4 datasets

568d477

Made changes and added a new dataset

3466280

Add files via upload

7ede011

KranthiGV reviewed May 15, 2024

View reviewed changes

Art3mis0707 added 2 commits May 15, 2024 19:39

Added a multilingual dataset on sentiment analysis in different Niger…

0fbc2d7

…ian Languages

Made changes to the thai and romanian datasets

18779cd

Making changes and adding the new dataset through a new PR

1289147

imenelydiaker reviewed May 15, 2024

View reviewed changes

mteb/tasks/Classification/ron/Moroco.py Show resolved Hide resolved

Art3mis0707 changed the title ~~Thai Restaurant Review Dataset~~ Addition of : Thai, Romanian, Hebrew, Korean, Burmese, Nigerian (Multilingual) Datasets May 15, 2024

Art3mis0707 requested a review from imenelydiaker May 15, 2024 17:24

Added the dialect codes for Moldavian

02d5ed0

added points

3c9d461

imenelydiaker reviewed May 16, 2024

View reviewed changes

mteb/tasks/Classification/multilingual/IndicSentimentClassification.py Outdated Show resolved Hide resolved

mteb/tasks/Classification/multilingual/IndicSentimentClassification.py Outdated Show resolved Hide resolved

Art3mis0707 and others added 4 commits May 16, 2024 17:57

Update mteb/tasks/Classification/multilingual/IndicSentimentClassific…

6d27ef6

…ation.py Co-authored-by: Imene Kerboua <33312980+imenelydiaker@users.noreply.github.com>

Update mteb/tasks/Classification/multilingual/IndicSentimentClassific…

38fe5b0

…ation.py Co-authored-by: Imene Kerboua <33312980+imenelydiaker@users.noreply.github.com>

Added to Contributor Information

ee54ec5

Added to Contributor Information after making lint

dc7682c

Art3mis0707 requested a review from imenelydiaker May 16, 2024 13:24

imenelydiaker approved these changes May 16, 2024

View reviewed changes

Merge branch 'main' into thai

790f4e7

imenelydiaker enabled auto-merge (squash) May 16, 2024 14:01

imenelydiaker merged commit 8f7817f into embeddings-benchmark:main May 16, 2024
7 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Addition of : Thai, Romanian, Hebrew, Korean, Burmese, Nigerian (Multilingual) Datasets #724

Addition of : Thai, Romanian, Hebrew, Korean, Burmese, Nigerian (Multilingual) Datasets #724

Art3mis0707 commented May 15, 2024 •

edited

imenelydiaker left a comment

KranthiGV commented May 15, 2024

Art3mis0707 commented May 15, 2024

Art3mis0707 commented May 15, 2024 •

edited

imenelydiaker left a comment

Art3mis0707 commented May 16, 2024

imenelydiaker left a comment

Addition of : Thai, Romanian, Hebrew, Korean, Burmese, Nigerian (Multilingual) Datasets #724

Addition of : Thai, Romanian, Hebrew, Korean, Burmese, Nigerian (Multilingual) Datasets #724

Conversation

Art3mis0707 commented May 15, 2024 • edited

Checklist for adding MMTEB dataset

imenelydiaker left a comment

Choose a reason for hiding this comment

KranthiGV commented May 15, 2024

Art3mis0707 commented May 15, 2024

Art3mis0707 commented May 15, 2024 • edited

imenelydiaker left a comment

Choose a reason for hiding this comment

Art3mis0707 commented May 16, 2024

imenelydiaker left a comment

Choose a reason for hiding this comment

Art3mis0707 commented May 15, 2024 •

edited

Art3mis0707 commented May 15, 2024 •

edited