Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

The computaion of neg_reward is wrong #16

Open
zzxn opened this issue Jul 26, 2020 · 1 comment
Open

The computaion of neg_reward is wrong #16

zzxn opened this issue Jul 26, 2020 · 1 comment

Comments

@zzxn
Copy link

zzxn commented Jul 26, 2020

This code uses batch-averaged (sample_rouge - baseline rouge), but it don't make sense in math and this item should be sample-wise because what we really want to maximize is this:

@saiprabhakar
Copy link

Check #7 . The negative sign is included in the LogP so the author has reversed it in the reward.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants