Are control actions scaled in BRAX environments? #472

nic-barbara · 2024-03-26T02:03:05Z

Networks used as control policies in BRAX seem to have a tanh layer on the output to constrain actions to [-1,1]. However, many of the environments in BRAX have action spaces with a range greater then [-1, 1]. For example, the inverted_pendulum environment accepts actions in the range [-3,3].

Is there somewhere that scales the policy output to the actuator ranges for a given environment? Or are all control policies in BRAX currently restricted to actions in [-1,1]?

Thanks in advance for any advice/help!

The text was updated successfully, but these errors were encountered:

btaba · 2024-03-27T03:23:27Z

Hi @nic-barbara

It looks like pendulum and reacher envs are affected by this bug, where we don't scale the actions. Feel free to send over a PR where you scale the action, here's a reference of how that would be implemented:

https://github.com/Farama-Foundation/Gymnasium/blob/373ccf0e005efc2835fa25a56aa4058960de711f/gymnasium/envs/mujoco/mujoco_env.py#L97-L101

This was also implemented here, but it's unused:

brax/brax/envs/wrappers/gym.py

Lines 50 to 51 in 2329ae7

    
           action = jax.tree_map(np.array, self._env.sys.actuator.ctrl_range) 
        
           self.action_space = spaces.Box(action[:, 0], action[:, 1], dtype='float32')

nic-barbara · 2024-03-27T04:13:06Z

Thanks @btaba, I'll take a look! Should we do the same for the humanoid and humanoidstandup environments too? The humanoid is restricted to [-0.4,0.4] on all control inputs which means the policy output will just saturate rather than smoothly hitting the [-1,1] boundaries of tanh. This might make training more difficult?

btaba · 2024-03-27T04:58:23Z

AFAIU we were working off of humanoid-v4, which is in [-1, 1]. I would look at the docstrings in brax. It looks like Farama deleted the docstrings for their older versions...

brax/brax/envs/humanoid.py

Line 48 in 2329ae7

continuous `(action, ...)` all in `[-1, 1]`, where `action` represents the

In practice, I tested that training curves and behaviors for all environments look good, (at the time when these environments were implemented). I compared training curves and behaviors in video to an older version of brax, across all physics backends. It'd be awesome if you could do a similar exercise for environments you edit, to show that policies are at least as good as the base version.

nic-barbara · 2024-03-27T05:04:25Z

If I have time I'll do the same, thanks for the suggestion. Unfortunately I don't have a huge amount of compute power so it might have to wait a while.

You're right that the humanoid says it uses [-1,1] in the docstring, but the actual humanoid.xml file still seems to limit the control inputs with ctrlrange="-.4 .4":

brax/brax/envs/assets/humanoid.xml

Line 6 in 2329ae7

btaba · 2024-03-27T05:21:08Z

Interesting, that's probably why they changed it in v5 :). In this case, the simulator is clipping the actions, and that hasn't been an obvious issue for training humanoid. But it'd be good to ablate if you find the time!

nic-barbara · 2024-03-27T06:42:38Z

@btaba I just submitted #473, let me know what you think.

nic-barbara changed the title ~~Are control actions correctly scaled in BRAX environments?~~ Are control actions scaled in BRAX environments? Mar 26, 2024

btaba assigned nic-barbara Mar 27, 2024

nic-barbara mentioned this issue Mar 27, 2024

Scaling control actions for BRAX environments #473

Merged

btaba closed this as completed in #473 May 20, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Are control actions scaled in BRAX environments? #472

Are control actions scaled in BRAX environments? #472

nic-barbara commented Mar 26, 2024 •

edited

btaba commented Mar 27, 2024

nic-barbara commented Mar 27, 2024

btaba commented Mar 27, 2024

nic-barbara commented Mar 27, 2024

btaba commented Mar 27, 2024 •

edited

nic-barbara commented Mar 27, 2024

Are control actions scaled in BRAX environments? #472

Are control actions scaled in BRAX environments? #472

Comments

nic-barbara commented Mar 26, 2024 • edited

btaba commented Mar 27, 2024

nic-barbara commented Mar 27, 2024

btaba commented Mar 27, 2024

nic-barbara commented Mar 27, 2024

btaba commented Mar 27, 2024 • edited

nic-barbara commented Mar 27, 2024

nic-barbara commented Mar 26, 2024 •

edited

btaba commented Mar 27, 2024 •

edited