-
Notifications
You must be signed in to change notification settings - Fork 194
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add new dataset: GermanGovServiceRetrieval #731
Add new dataset: GermanGovServiceRetrieval #731
Conversation
Do you have time to review? @guenthermi @Muennighoff @PhilipMay |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think everything looks good. Feel free to add points. You might also consider using ndcg_5 instead of 10 since the dataset is quite small.
Good point. I changed the main metric. |
Hey @malteos . I have almost no knowledge with MTEB. Never implemented anything here. Sorry. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Have enabled auto-merge and updated the points. Let me know if you disagree. Thanks for the addition!
GermanGovServiceRetrieval: LHM-Dienstleistungen-QA is a German question answering dataset for government services of the Munich city administration. It associates questions with a textual context containing the answer
Checklist for adding MMTEB dataset
Reason for dataset addition: Domain-specific retrieval dataset for German
mteb
package.mteb run -m {model_name} -t {task_name}
command.sentence-transformers/paraphrase-multilingual-MiniLM-L12-v2
intfloat/multilingual-e5-small
self.stratified_subsampling() under dataset_transform()
make test
.make lint
.438.jsonl
).