Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Are control actions scaled in BRAX environments? #472

Closed
nic-barbara opened this issue Mar 26, 2024 · 6 comments · Fixed by #473
Closed

Are control actions scaled in BRAX environments? #472

nic-barbara opened this issue Mar 26, 2024 · 6 comments · Fixed by #473
Assignees

Comments

@nic-barbara
Copy link
Contributor

nic-barbara commented Mar 26, 2024

Networks used as control policies in BRAX seem to have a tanh layer on the output to constrain actions to [-1,1]. However, many of the environments in BRAX have action spaces with a range greater then [-1, 1]. For example, the inverted_pendulum environment accepts actions in the range [-3,3].

Is there somewhere that scales the policy output to the actuator ranges for a given environment? Or are all control policies in BRAX currently restricted to actions in [-1,1]?

Thanks in advance for any advice/help!

@nic-barbara nic-barbara changed the title Are control actions correctly scaled in BRAX environments? Are control actions scaled in BRAX environments? Mar 26, 2024
@btaba
Copy link
Collaborator

btaba commented Mar 27, 2024

Hi @nic-barbara

It looks like pendulum and reacher envs are affected by this bug, where we don't scale the actions. Feel free to send over a PR where you scale the action, here's a reference of how that would be implemented:

https://github.com/Farama-Foundation/Gymnasium/blob/373ccf0e005efc2835fa25a56aa4058960de711f/gymnasium/envs/mujoco/mujoco_env.py#L97-L101

This was also implemented here, but it's unused:

action = jax.tree_map(np.array, self._env.sys.actuator.ctrl_range)
self.action_space = spaces.Box(action[:, 0], action[:, 1], dtype='float32')

@nic-barbara
Copy link
Contributor Author

Thanks @btaba, I'll take a look! Should we do the same for the humanoid and humanoidstandup environments too? The humanoid is restricted to [-0.4,0.4] on all control inputs which means the policy output will just saturate rather than smoothly hitting the [-1,1] boundaries of tanh. This might make training more difficult?

@btaba
Copy link
Collaborator

btaba commented Mar 27, 2024

AFAIU we were working off of humanoid-v4, which is in [-1, 1]. I would look at the docstrings in brax. It looks like Farama deleted the docstrings for their older versions...

continuous `(action, ...)` all in `[-1, 1]`, where `action` represents the

In practice, I tested that training curves and behaviors for all environments look good, (at the time when these environments were implemented). I compared training curves and behaviors in video to an older version of brax, across all physics backends. It'd be awesome if you could do a similar exercise for environments you edit, to show that policies are at least as good as the base version.

@nic-barbara
Copy link
Contributor Author

If I have time I'll do the same, thanks for the suggestion. Unfortunately I don't have a huge amount of compute power so it might have to wait a while.

You're right that the humanoid says it uses [-1,1] in the docstring, but the actual humanoid.xml file still seems to limit the control inputs with ctrlrange="-.4 .4":

<motor ctrllimited="true" ctrlrange="-.4 .4"/>

@btaba
Copy link
Collaborator

btaba commented Mar 27, 2024

Interesting, that's probably why they changed it in v5 :). In this case, the simulator is clipping the actions, and that hasn't been an obvious issue for training humanoid. But it'd be good to ablate if you find the time!

@nic-barbara
Copy link
Contributor Author

@btaba I just submitted #473, let me know what you think.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants