Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Possible Bug In Handling Batch Size During Common Sense Evaluation #61

Open
mchorton opened this issue Apr 2, 2024 · 1 comment
Open

Comments

@mchorton
Copy link

mchorton commented Apr 2, 2024

I am debugging poor performance of a model I'm experimenting with. It gets pretty good CoreEN scores, but it is generating nonsensical responses when running commonsense_evaluate.py. For instance, it gives repeated tokens for a lot of inputs.

After some more digging, it looks like this generation call is causing a problem when the batch size is greater than 1.

In this case, padding tokens will be added to many of the batch elements. The generate() call isn't given an indication of how many padding tokens are being used. This causes my model to generate garbage outputs in cases where lots of padding appears in a batch. If I change the batch size to 1, outputs are much more reasonable.

It seems like this could be the cause of #38 . In that case, users are evaluating with batch sizes greater than 1, which seems likely to cause an issue.

Also FWIW, I am not sure why commonsense_evaluate.py allows users to choose a batch size, but evaluate.py does not. I'm guessing that's why I'm seeing issues about evaluate.py but not commonsense_evaluate.py.

@HZQ950419
Copy link
Collaborator

Hi,
Many thanks for pointing out this issue! I added batch decoding to commonsense_evaluate.py for acceleration as the target response of the commonsense task is very short. But the inputs in the commonsense task can be very long, so I used batch_size=1 for my experiments. That's why I didn't encounter this issue.

I'm trying to figure out the solution of this issue. If you have a method in mind to fix it, it's nice to submit a PR.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants