Unexpected policy behavior in halfcheetah ARS example #74

hdelecki · 2023-01-23T18:16:31Z

Running the halfcheetah_ars.jl example, I expected to see policy behavior similar to what is shown in the docs. Instead, I see that ARS gets a mean reward of around -23 and the resulting policy tends to move backward. Is this the expected behavior?

I'm using julia 1.8, Ubuntu 20.04, and the main branch of Dojo.jl

The text was updated successfully, but these errors were encountered:

janbruedigam · 2023-04-12T08:05:20Z

The example was removed in the latest version, but the mechanism still exists for people to create their own version. The ant ARS example should work, but might require some tuning of hyperparameters to make it walk properly.

hdelecki · 2023-04-12T19:51:14Z

Thanks so much! Do you know why the previous version using halfcheetah didn't work?

Are there any specific mechanism parameters or functions I would need to implement to create an environment for halfcheetah similar to the new ant ARS?

janbruedigam · 2023-04-13T07:59:42Z

Not exactly sure what the issue was before, but there were a lot of changes on the simulation and contact behavior. I believe the training success is rather sensitive to these parameters and to the reward function, so that could be what broke it.

As a rough starting point:

create a halfcheetah_ars environment similar to https://github.com/dojo-sim/Dojo.jl/blob/main/DojoEnvironments/src/environments/ant_ars.jl
create a training file similar to https://github.com/dojo-sim/Dojo.jl/blob/main/examples/learning/ant_ars.jl
These two files should be all you need. Modifying these files to use halfcheetah instead of ant might work, that is, run without error, but you may also change some details like the get_state function or how the reward is calculated because some dimensions are going to be different.
If everything runs, then you'll need to tune the reward in the rollout_policy function. I guess the tradeoff between control action and forward reward is important. And you can also change the scale of parameters in Policy and the noise in HyperParameters to change the initial parameters and their updates.
As an initial verification, you should also check if the magnitude of the control action makes sense. So not training too much but just running the sim and seeing if the halfcheetah moves millimeters, centimeters, meters, ... If the movement is too small all the tuning wont make much difference, if it's too big, the simulation might get unstable.

If you get something to work, also improvements to the ant, open a pull request and we can integrate that.

janbruedigam closed this as completed Apr 12, 2023

janbruedigam reopened this Apr 13, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Unexpected policy behavior in halfcheetah ARS example #74

Unexpected policy behavior in halfcheetah ARS example #74

hdelecki commented Jan 23, 2023

janbruedigam commented Apr 12, 2023

hdelecki commented Apr 12, 2023

janbruedigam commented Apr 13, 2023

Unexpected policy behavior in halfcheetah ARS example #74

Unexpected policy behavior in halfcheetah ARS example #74

Comments

hdelecki commented Jan 23, 2023

janbruedigam commented Apr 12, 2023

hdelecki commented Apr 12, 2023

janbruedigam commented Apr 13, 2023