Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ppo中出现NAN #187

Open
xxx-007 opened this issue Nov 9, 2020 · 2 comments
Open

ppo中出现NAN #187

xxx-007 opened this issue Nov 9, 2020 · 2 comments

Comments

@xxx-007
Copy link

xxx-007 commented Nov 9, 2020

你好,莫烦老师,我在运行simple_ppo算法中,,根据当前状态选择一个动作 a=self.sess.run(self.sample_op,{self.tfs:s})[0],,选择出来的动作为nan,,我应该如何修改,才能在运行代码过程中不在出现nan值,

@xxx-007
Copy link
Author

xxx-007 commented Nov 9, 2020

init 函数中下面这行代码应该在分母加上epsilon,防止出现nan
ratio = self.pi.prob(self.tfa) / self.old_pi.prob(self.tfa)
也就是改为如下代码
ratio = self.pi.prob(self.tfa) / (self.old_pi.prob(self.tfa)+EPS)

采取这个建议,修改之后仍然出现nan

@wagh311
Copy link

wagh311 commented Mar 8, 2024

请问你最终解决这个问题了吗?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants