Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

/chapter6/chapter6_questions&keywords #56

Open
qiwang067 opened this issue May 24, 2021 · 2 comments
Open

/chapter6/chapter6_questions&keywords #56

qiwang067 opened this issue May 24, 2021 · 2 comments

Comments

@qiwang067
Copy link
Contributor

https://datawhalechina.github.io/easy-rl/#/chapter6/chapter6_questions&keywords

Description

@Strawberry47
Copy link

Thanks♪(・ω・)ノ

@cp-Aurora
Copy link

这个地方不是很理解:习题6-3(4)中:“所以对于时序差分方法来说,rr 是一个随机变量。”习题6-5中:“我们希望它们两个相减的损失值与 r_tr
t

尽可能地接近。这也是网络的优化目标,我们称之为损失函数。”所以DQN的优化目标是拟合随机变量?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants