Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bug of PPO #1072

Open
GIS-PuppetMaster opened this issue Mar 13, 2020 · 4 comments
Open

Bug of PPO #1072

GIS-PuppetMaster opened this issue Mar 13, 2020 · 4 comments

Comments

@GIS-PuppetMaster
Copy link

ratio = tf.exp(pi.log_prob(action) - old_pi.log_prob(action))
surr = ratio * adv
...
loss = -tf.reduce_mean( tf.minimum(surr, tf.clip_by_value(ratio, 1. - self.epsilon, 1. + self.epsilon) * adv) )

should use ratio in tf.minimum rather than surr, because surr=ration*adv, and there could be negative value in adv, so the result of tf.minimum may contain a value like -1e10, and cause actor's loss failed.

@GIS-PuppetMaster
Copy link
Author

it should be like this:
self.cliped_ratio = tf.clip_by_value(self.ratio, 1. - METHOD['epsilon'],
1. + METHOD['epsilon'])
self.min_temp = tf.minimum(self.ratio, self.cliped_ratio)
self.aloss = -tf.reduce_mean(self.min_temp * self.tfadv)

@quantumiracle
Copy link
Member

Why the negative value causes failure in actor loss?
You can also refer to OpenAI baselines here, which has similar process as our repo.

@GIS-PuppetMaster
Copy link
Author

GIS-PuppetMaster commented Mar 13, 2020

Why the negative value causes failure in actor loss?
You can also refer to OpenAI baselines here, which has similar process as our repo.

I drawed the loss polt and reward plot, when there is a very small negative value, such as 1e-10, the loss will be extremly larger than normally, and the reward stoped increase.
I just tried lower learning rate, and there was no such 1e-10 value came out.
I wonder if it's the same that use my code above, since it's more robust.

@quantumiracle
Copy link
Member

quantumiracle commented Jan 6, 2021

Sorry for the late reply. What you mentioned might be caused by some numerical issues in tf.minimum if I understood correctly. Could you please print out an example case and paste it here? I'm a bit confused by your description since you mentioned both large negative value (-1e10) and small positive value (1e-10). A case showing how it causes a large loss value would be great.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants