-
Notifications
You must be signed in to change notification settings - Fork 194
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation Retrieval Task and TopiOCQA dataset #714
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks really good overall, think it's almost there! Excited to have these added 🚀
Based on the current changes, here are the scenarios now:
Doing both 2. and 3. provides maximum flexibility and options to the user. |
I like the plan to extend it into the cross-encoders! Can we collapse 2 and 3 into the same |
I thought about that but there are a couple of issues here.
What do you think about these concerns? |
Summary from a brief chat we had, for note taking purposes. We will keep both I think all that's left is adding the results files and the points files @vaibhavad. Thanks for the help! |
@orionw - This is ready to be merged! Thanks for all the support and feedback throughout the process |
Thanks @vaibhavad! LGTM. Two very minor comments/questions:
|
Thanks, I added 4 points under bug fixes. I am also working with @xhluca and conversational retrieval workflow may be changed slightly to include the role of the speaker as well (assistant, user etc), but it is best to take that up in a different PR. Regarding the scores, they look okay to me. The corpus is 25 million passages, so the random score will be way worse than this. The models trained on conversational retrieval should get scores in 0.6-0.8 (ballpark). So 0.1 seems a reasonable score for a model which is not trained on this task but can still so some surface level matching. Also the first question of each conversation is a decontextualized questions, similar to many QA datasets that these models have been trained on. Hence, some performance may be coming from there. Fell free to merge the branch :) Thanks again for all the help and support. |
Awesome! I have a similar dataset (as @vaibhavad mentioned). Is it possible for me to request both of you (@orionw @vaibhavad) as reviewer for my next PR? |
…k#714) * conv retrieval evaluation * add encode_conversations to DenseRetrievalExactSearch * remove duplication of default logic of conv -> query * fix: queries has been converted from dict to list by this * make it cross encoder compatible * topiocqa dataset * dont need a separate function * metadata, full corpus * more metadata * baseline results * add points * update points
Checklist for adding MMTEB dataset
Reason for dataset addition:
mteb
package.mteb run -m {model_name} -t {task_name}
command.sentence-transformers/paraphrase-multilingual-MiniLM-L12-v2
intfloat/multilingual-e5-small
self.stratified_subsampling() under dataset_transform()
make test
.make lint
.438.jsonl
).