Do something about the actor-critic Coursera assignment #398

dniku · 2020-04-28T23:22:09Z

Currently, week5_policy_based/practice_a3c.ipynb has numerous problems.

It does not implement A3C. It is a plain actor-critic.
We only have it in Tensorflow, since it does not have a corresponding assignment in master (it is a heavily modified version of master/week08/practice_pomdp which was never originally intended to be an actor-critic assignment).

The difficulty is fixing this is that the videos that lead up to this assignment talk about A3C a lot.

The text was updated successfully, but these errors were encountered:

AI-Ahmed · 2022-11-13T12:43:48Z

@dniku, Have you done anything so far about this problem? Besides, I am facing issues in the assignment of week08 related to this issue. I am trying to fix both now; the policy loss and the reward are not correct in both studies (week08 and week06), although I have done everything that I can be done to fix both of them. I want to know if it is something regards the atari_util.py file and not in our code!!!

AI-Ahmed · 2022-11-13T12:46:48Z

Some screenshots of both cases;

Week 06:

Week 08:

dniku · 2022-11-13T16:00:57Z

@AI-Ahmed

We haven't done anything about this assignment; but this issue is about the Coursera assignment specifically, and not the ones in the master branch, which, I assume, you are talking about.

Your screenshots of plots seem to indicate that your agent isn't learning anything at all, and is behaving randomly. I'd guess that the reason is some bug in your code, e.g. a - sign missing before the loss. You may want to refer to some open-source implementation of A2C — e.g. this one — to compare it with yours and possibly spot some errors.

AI-Ahmed · 2022-11-25T15:40:32Z

Hello @dniku,
The problem was solved in both notebooks. The problem I had in both notebooks was related to multiplying is_not_done by the value_target. When I didn't do that, that added more potential to the imperfect value function rather than adding the possibility to the reward, and the agent thought that there was no end to the episode, which means that all the value functions for all the states have equality due to the continuity.

dniku added the coursera label Apr 28, 2020

dniku changed the title ~~Add PyTorch version of actor-critic Coursera assignment~~ Do something about the actor-critic Coursera assignment Apr 30, 2020

dniku added this to the Coursera 1.0 milestone May 7, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Do something about the actor-critic Coursera assignment #398

Do something about the actor-critic Coursera assignment #398

dniku commented Apr 28, 2020 •

edited

AI-Ahmed commented Nov 13, 2022 •

edited

AI-Ahmed commented Nov 13, 2022

dniku commented Nov 13, 2022

AI-Ahmed commented Nov 25, 2022 •

edited

Do something about the actor-critic Coursera assignment #398

Do something about the actor-critic Coursera assignment #398

Comments

dniku commented Apr 28, 2020 • edited

AI-Ahmed commented Nov 13, 2022 • edited

AI-Ahmed commented Nov 13, 2022

dniku commented Nov 13, 2022

AI-Ahmed commented Nov 25, 2022 • edited

dniku commented Apr 28, 2020 •

edited

AI-Ahmed commented Nov 13, 2022 •

edited

AI-Ahmed commented Nov 25, 2022 •

edited