Skip to content

Different explaination about the argument ignore_done #610

Answered by zjowowen
Seraphli asked this question in Q&A
Discussion options

You must be logged in to vote

The argument ignore_done was introduced for calculate q value.

The bool variable done being true in gym.step() has two causes. For most of the time, the env is terminated by defined by some failure conditions. Sometimes, done is true because env steps number has reached its timelimit.

OpenAI gym is a rather unstable env manager with api changing from time to time. It is really bothering for maintainence. The gym env did not give useful signals for these two cases before version 0.25.0.

But it is different for these two cases. In the calculation of Bellman equation, target-value=r+next-state-value. If env is truely terminated, next state is an end game with value being zero. If env is not …

Replies: 1 comment 4 replies

Comment options

You must be logged in to vote
4 replies
@Seraphli
Comment options

@zjowowen
Comment options

@Seraphli
Comment options

@zjowowen
Comment options

Answer selected by Seraphli
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Category
Q&A
Labels
None yet
2 participants