-
Notifications
You must be signed in to change notification settings - Fork 192
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add SWIM-IR #617
Comments
Amazing. Feel free to open a PR :) |
That'd be great indeed cc @thakur-nandan |
Thanks @Muennighoff. The SWIM-IR dataset would be great and contains training splits only as it should be used for training. If that would be desirable we can go ahead and add it into MTEB. Let me know if you need help @rasdani. Thanks, |
Oh does it still make sense to use it for evaluation or not at all? Not sure if adding a training dataset makes sense cc @KennethEnevoldsen |
I wouldn't add a dataset intended for training unless we expect it to evaluate an aspect which we are currently not evaluating. |
Google released a new crosslingual retrieval dataset:
https://huggingface.co/datasets/nthakur/swim-ir-cross-lingual
We could turn a subset of this into a retrieval and reranking benchmark.
If no one picks this up, I can take at look at this during the weekend.
The text was updated successfully, but these errors were encountered: