Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adding a new Dataset: Arabic Reviews of SHEIN #710

Merged
merged 22 commits into from
May 16, 2024

Conversation

Ruqyai
Copy link
Contributor

@Ruqyai Ruqyai commented May 14, 2024

Checklist for adding MMTEB dataset

Reason for dataset addition:

  • I have tested that the dataset runs with the mteb package.
  • I have run the following models on the task (adding the results to the pr). These can be run using the mteb run -m {model_name} -t {task_name} command.
    • sentence-transformers/paraphrase-multilingual-MiniLM-L12-v2
    • intfloat/multilingual-e5-small
  • I have checked that the performance is neither trivial (both models gain close to perfect scores) nor random (both models gain close to random scores).
  • If the dataset is too big (e.g. >2048 examples), considering using self.stratified_subsampling() under dataset_transform()
  • I have filled out the metadata object in the dataset file (find documentation on it here).
  • Run tests locally to make sure nothing is broken using make test.
  • Run the formatter to format the code using make lint.
  • I have added points for my submission to the points folder using the PR number as the filename (e.g. 438.jsonl).

Copy link
Contributor

@imenelydiaker imenelydiaker left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! Thanks for this addition!
I left some comments, also missing intfloat/multilingual-e5-small results.

@imenelydiaker imenelydiaker self-assigned this May 14, 2024
@Ruqyai
Copy link
Contributor Author

Ruqyai commented May 14, 2024

LGTM! Thanks for this addition! I left some comments, also missing intfloat/multilingual-e5-small results.

I did

@Ruqyai Ruqyai requested a review from imenelydiaker May 14, 2024 18:14
Copy link
Contributor

@imenelydiaker imenelydiaker left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, I'll let you update the points with the correct reviewers name and we'll be able to merge

docs/mmteb/points/674.jsonl Outdated Show resolved Hide resolved
@Ruqyai
Copy link
Contributor Author

Ruqyai commented May 15, 2024

Done .. is it ok? @imenelydiaker

@Ruqyai Ruqyai requested a review from imenelydiaker May 15, 2024 15:14
@Ruqyai
Copy link
Contributor Author

Ruqyai commented May 15, 2024

Question?
As I know, today is the last day to contribute by adding a new dataset:
https://github.com/embeddings-benchmark/mteb/tree/main/docs/mmteb
Can my contribution be counted as almost done, just waiting for the merge?

@imenelydiaker
@KennethEnevoldsen
@KranthiGV

@imenelydiaker
Copy link
Contributor

Question? As I know, today is the last day to contribute by adding a new dataset: https://github.com/embeddings-benchmark/mteb/tree/main/docs/mmteb Can my contribution be counted as almost done, just waiting for the merge?

@imenelydiaker @KennethEnevoldsen @KranthiGV

No worries about this, LGTM, let's merge.

@imenelydiaker imenelydiaker enabled auto-merge (squash) May 16, 2024 07:39
@imenelydiaker
Copy link
Contributor

@Ruqyai can you please run make pr locally and push changes?

@Ruqyai
Copy link
Contributor Author

Ruqyai commented May 16, 2024

@Ruqyai can you please run make pr locally and push changes?

Done ..

======================================================== short test summary info ========================================================
FAILED tests/test_TaskMetadata.py::test_all_metadata_is_filled - ValueError: The metadata of the following datasets is not filled: ['OnlineStoreReviewSentimentClassification']
FAILED tests/test_all_abstasks.py::test_dataset_availability - aiohttp.client_exceptions.ClientConnectorCertificateError: Cannot connect to host huggingface.co:443 ssl:True [SSLCertVerificationEr...
================================== 2 failed, 527 passed, 149 skipped, 5 warnings in 181.77s (0:03:01) ===================================
make[1]: *** [test] Error 1
make: *** [pr] Error 2

I fixed them

@imenelydiaker imenelydiaker merged commit ba79fc8 into embeddings-benchmark:main May 16, 2024
7 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants