Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unexpected policy behavior in halfcheetah ARS example #74

Open
hdelecki opened this issue Jan 23, 2023 · 3 comments
Open

Unexpected policy behavior in halfcheetah ARS example #74

hdelecki opened this issue Jan 23, 2023 · 3 comments

Comments

@hdelecki
Copy link

Running the halfcheetah_ars.jl example, I expected to see policy behavior similar to what is shown in the docs. Instead, I see that ARS gets a mean reward of around -23 and the resulting policy tends to move backward. Is this the expected behavior?

I'm using julia 1.8, Ubuntu 20.04, and the main branch of Dojo.jl

@janbruedigam
Copy link
Member

The example was removed in the latest version, but the mechanism still exists for people to create their own version. The ant ARS example should work, but might require some tuning of hyperparameters to make it walk properly.

@hdelecki
Copy link
Author

Thanks so much! Do you know why the previous version using halfcheetah didn't work?

Are there any specific mechanism parameters or functions I would need to implement to create an environment for halfcheetah similar to the new ant ARS?

@janbruedigam
Copy link
Member

Not exactly sure what the issue was before, but there were a lot of changes on the simulation and contact behavior. I believe the training success is rather sensitive to these parameters and to the reward function, so that could be what broke it.

As a rough starting point:

  • create a halfcheetah_ars environment similar to https://github.com/dojo-sim/Dojo.jl/blob/main/DojoEnvironments/src/environments/ant_ars.jl
  • create a training file similar to https://github.com/dojo-sim/Dojo.jl/blob/main/examples/learning/ant_ars.jl
  • These two files should be all you need. Modifying these files to use halfcheetah instead of ant might work, that is, run without error, but you may also change some details like the get_state function or how the reward is calculated because some dimensions are going to be different.
  • If everything runs, then you'll need to tune the reward in the rollout_policy function. I guess the tradeoff between control action and forward reward is important. And you can also change the scale of parameters in Policy and the noise in HyperParameters to change the initial parameters and their updates.
  • As an initial verification, you should also check if the magnitude of the control action makes sense. So not training too much but just running the sim and seeing if the halfcheetah moves millimeters, centimeters, meters, ... If the movement is too small all the tuning wont make much difference, if it's too big, the simulation might get unstable.

If you get something to work, also improvements to the ant, open a pull request and we can integrate that.

@janbruedigam janbruedigam reopened this Apr 13, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants