Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Suggestion - implement some "tricks" that improve performance #266

Open
henrycharlesworth opened this issue Feb 10, 2021 · 1 comment
Open

Comments

@henrycharlesworth
Copy link

Given how popular this repo is (and rightly so), I was thinking it might be a good idea to implement some simple tricks that have been shown to improve performance with on-policy RL algorithms. I'm thinking mostly about this paper: https://arxiv.org/pdf/2006.05990.pdf, where they do a large scale study of all of the little decisions that can make a big difference in performance.

I haven't ran extensive experiments but I've implemented a couple of the things they mention and they do seem to significantly boost performance. In particular, modifying the code so that the advantages are recomputed every epoch of the update as they recommend does seem to improve performance. And then an even simpler thing with the initialisation seems to make an even bigger difference - for continuous control initialising the action std in a way such that initially its value is 0.5 for each dimension, and then multiplying the weights of the output policy layer by 0.01 at the start (there are a lot of other things they discuss too in that paper).

@ChenDRAG
Copy link

ChenDRAG commented Mar 31, 2021

@henrycharlesworth I have tried a number of suggestions proposed in the paper you mentioned (ablation studies suggest some of them are useful, some are temporarily not) and implement "recompute advantage" strategy, which is helpful indeed. It is in my benchmark of mujoco here, check out the details if you are interested.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants