New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
/chapter4/chapter4_questions&keywords #53
Comments
$\text { 因此 } \nabla \mathrm{p}{\theta}(\tau)=\nabla \log \mathrm{p}{\theta}\left(\mathrm{a}{\mathrm{t}}^{\mathrm{n}} \mid \mathrm{s}{\mathrm{t}}^{\mathrm{n}}\right)$ 是不是写错了? |
谢谢你的留言,应该是没有写错的,具体的公式推导可见教程 “第四章 策略梯度”。 |
谢谢博主 Thanks♪(・ω・)ノ |
keywords里的“Reinforce”是不是写成全大写的“REINFORCE”更好些。与之前的笔记更衔接些。 |
是的是的,这里的REINFORCE表示一种基于策略梯度并使用回合更新的强化学习的经典算法,应该区别于Reinforce,谢谢你的建议,已经改正~ |
Policy Gradient |
就我觉得符号体系混乱吗?策略一会是p 一会是π,并且和前三章体系也不同, |
用 p 来表示策略是为了方便读者理解,后续会考虑统一符号(加上对应注解); |
https://datawhalechina.github.io/easy-rl/#/chapter4/chapter4_questions&keywords
Description
The text was updated successfully, but these errors were encountered: