Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

About rl_loss #7

Open
xcfcode opened this issue Feb 16, 2019 · 0 comments
Open

About rl_loss #7

xcfcode opened this issue Feb 16, 2019 · 0 comments

Comments

@xcfcode
Copy link

xcfcode commented Feb 16, 2019

Thank you for this excellent job, I still have some questions about rl_loss, rl_loss = neg_reward * sample_out.loss, the neg_reward is obtained by greedy_rouge - sample_rouge, and the sample_out.loss means the cross-entropy loss, it is equal to -LogP(). However, in the paper, self-critical policy gradient training algorithm uses LogP(), this confused me, could you please explain this?


Update
I have read SeqGAN code from SeqGAN, according to the policy gradient, the loss is computed as loss += -out[j][target.data[i][j]]*reward[j], out means Log_softmax, so the author adds "-" to using gradient descent later.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant