-
Notifications
You must be signed in to change notification settings - Fork 233
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Are control actions scaled in BRAX environments? #472
Comments
Hi @nic-barbara It looks like pendulum and reacher envs are affected by this bug, where we don't scale the actions. Feel free to send over a PR where you scale the action, here's a reference of how that would be implemented: This was also implemented here, but it's unused: brax/brax/envs/wrappers/gym.py Lines 50 to 51 in 2329ae7
|
Thanks @btaba, I'll take a look! Should we do the same for the |
AFAIU we were working off of humanoid-v4, which is in Line 48 in 2329ae7
In practice, I tested that training curves and behaviors for all environments look good, (at the time when these environments were implemented). I compared training curves and behaviors in video to an older version of brax, across all physics backends. It'd be awesome if you could do a similar exercise for environments you edit, to show that policies are at least as good as the base version. |
If I have time I'll do the same, thanks for the suggestion. Unfortunately I don't have a huge amount of compute power so it might have to wait a while. You're right that the humanoid says it uses brax/brax/envs/assets/humanoid.xml Line 6 in 2329ae7
|
Interesting, that's probably why they changed it in v5 :). In this case, the simulator is clipping the actions, and that hasn't been an obvious issue for training humanoid. But it'd be good to ablate if you find the time! |
Networks used as control policies in BRAX seem to have a
tanh
layer on the output to constrain actions to[-1,1]
. However, many of the environments in BRAX have action spaces with a range greater then[-1, 1]
. For example, theinverted_pendulum
environment accepts actions in the range[-3,3]
.Is there somewhere that scales the policy output to the actuator ranges for a given environment? Or are all control policies in BRAX currently restricted to actions in
[-1,1]
?Thanks in advance for any advice/help!
The text was updated successfully, but these errors were encountered: