Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Do something about the actor-critic Coursera assignment #398

Open
dniku opened this issue Apr 28, 2020 · 4 comments
Open

Do something about the actor-critic Coursera assignment #398

dniku opened this issue Apr 28, 2020 · 4 comments
Labels
Milestone

Comments

@dniku
Copy link
Collaborator

dniku commented Apr 28, 2020

Currently, week5_policy_based/practice_a3c.ipynb has numerous problems.

  • It does not implement A3C. It is a plain actor-critic.
  • We only have it in Tensorflow, since it does not have a corresponding assignment in master (it is a heavily modified version of master/week08/practice_pomdp which was never originally intended to be an actor-critic assignment).

The difficulty is fixing this is that the videos that lead up to this assignment talk about A3C a lot.

@dniku dniku added the coursera label Apr 28, 2020
@dniku dniku changed the title Add PyTorch version of actor-critic Coursera assignment Do something about the actor-critic Coursera assignment Apr 30, 2020
@dniku dniku added this to the Coursera 1.0 milestone May 7, 2020
@AI-Ahmed
Copy link
Contributor

AI-Ahmed commented Nov 13, 2022

@dniku, Have you done anything so far about this problem? Besides, I am facing issues in the assignment of week08 related to this issue. I am trying to fix both now; the policy loss and the reward are not correct in both studies (week08 and week06), although I have done everything that I can be done to fix both of them. I want to know if it is something regards the atari_util.py file and not in our code!!!

@AI-Ahmed
Copy link
Contributor

Some screenshots of both cases;

Week 06:
image

Week 08:

image

@dniku
Copy link
Collaborator Author

dniku commented Nov 13, 2022

@AI-Ahmed

We haven't done anything about this assignment; but this issue is about the Coursera assignment specifically, and not the ones in the master branch, which, I assume, you are talking about.

Your screenshots of plots seem to indicate that your agent isn't learning anything at all, and is behaving randomly. I'd guess that the reason is some bug in your code, e.g. a - sign missing before the loss. You may want to refer to some open-source implementation of A2C — e.g. this one — to compare it with yours and possibly spot some errors.

@AI-Ahmed
Copy link
Contributor

AI-Ahmed commented Nov 25, 2022

Hello @dniku,
The problem was solved in both notebooks. The problem I had in both notebooks was related to multiplying is_not_done by the value_target. When I didn't do that, that added more potential to the imperfect value function rather than adding the possibility to the reward, and the agent thought that there was no end to the episode, which means that all the value functions for all the states have equality due to the continuity.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants