Replies: 1 comment
-
I'm sure all evaluations datasets have wrong labels, unfortunately. At least we are evaluating all models on the same datasets, so the hope is that these errors won't have an impact on the final ranking. That being said, any SweReC-specific fixes are of course welcome, and the most appropriate place to do so would be as a PR in the original source repo. If any such PRs have been merged, then we can refresh the ScandEval SweReC dataset (and re-evaluate all models on the improved dataset). |
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
🐛 Describe the bug
Some of the labels of SweRec are probably wrong;
Operating System
Linux
Device
CUDA GPU
Python version
3.10.x
ScandEval version
12.7.0
Beta Was this translation helpful? Give feedback.
All reactions