Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Question] Observation in Humanoid/Ant-v2 #1636

Closed
Rowing0914 opened this issue Aug 2, 2019 · 6 comments
Closed

[Question] Observation in Humanoid/Ant-v2 #1636

Rowing0914 opened this issue Aug 2, 2019 · 6 comments

Comments

@Rowing0914
Copy link

Hi,

Recently I've been working on some experiments using MuJoCo/OpenAI Gym.
And when I was checking the returns from env.step() on Humanoid-v2 and Ant-v2.
It returns the vector containing most items are zeros so that I have investigated a bit more on the source code like

And did read this issue: #585
but it seems like no one is asking about the issue which I have right now, the actual values in observation from humanoid/ant are dominated by 0.

So that I wonder if anyone gets the obs as me??

=== Info of my env ===

  • gym: v0.14.0
  • MuJoCo: v2.0.0
  • Python: 3.6.6
  • Obs in Ant-v2 shown below
[ 4.86801671e-01 9.81827799e-01 -1.64617166e-01 -1.56627964e-02

9.31130374e-02 -5.24809883e-01 5.23265547e-01 5.24312273e-01

-5.21959648e-01 -5.24233225e-01 -1.22195542e+00 5.24128510e-01

5.23526435e-01 -2.90538579e-07 -6.56325909e-07 -1.38336952e-15

1.22402905e-06 -8.01568735e-07 -2.76759106e-07 2.80583420e-15

-3.52882797e-15 -2.03051639e-15 9.77448673e-16 2.40798716e-16

-8.00517472e-16 2.80357298e-15 1.24554476e-15 0.00000000e+00

0.00000000e+00 0.00000000e+00 0.00000000e+00 0.00000000e+00

0.00000000e+00 0.00000000e+00 0.00000000e+00 0.00000000e+00

0.00000000e+00 0.00000000e+00 0.00000000e+00 0.00000000e+00

0.00000000e+00 0.00000000e+00 0.00000000e+00 0.00000000e+00

0.00000000e+00 0.00000000e+00 0.00000000e+00 0.00000000e+00

0.00000000e+00 0.00000000e+00 0.00000000e+00 0.00000000e+00

0.00000000e+00 0.00000000e+00 0.00000000e+00 0.00000000e+00

0.00000000e+00 0.00000000e+00 0.00000000e+00 0.00000000e+00

0.00000000e+00 0.00000000e+00 0.00000000e+00 0.00000000e+00

0.00000000e+00 0.00000000e+00 0.00000000e+00 0.00000000e+00

0.00000000e+00 0.00000000e+00 0.00000000e+00 0.00000000e+00

0.00000000e+00 0.00000000e+00 0.00000000e+00 0.00000000e+00

0.00000000e+00 0.00000000e+00 0.00000000e+00 0.00000000e+00

0.00000000e+00 0.00000000e+00 0.00000000e+00 0.00000000e+00

0.00000000e+00 0.00000000e+00 0.00000000e+00 0.00000000e+00

0.00000000e+00 0.00000000e+00 0.00000000e+00 0.00000000e+00

0.00000000e+00 0.00000000e+00 0.00000000e+00 0.00000000e+00

0.00000000e+00 0.00000000e+00 0.00000000e+00 0.00000000e+00

0.00000000e+00 0.00000000e+00 0.00000000e+00 0.00000000e+00

0.00000000e+00 0.00000000e+00 0.00000000e+00 0.00000000e+00

0.00000000e+00 0.00000000e+00 0.00000000e+00]
@christopherhesse
Copy link
Contributor

Going to close this in favor of #585 but in this particular case, it looks like the zeros are in cfrc_ext , which according to http://www.mujoco.org/book/reference.html is the external forces on the center of mass of different components of the model. It's likely there are many parts of the model with no external forces. But as this observation I gathered from Ant-v2 seems to show, they're not all always zero:

[ 2.58922679e-01  1.63536825e-01 -6.37013970e-01 -7.49357197e-01
  7.70240215e-02  5.88669942e-01  1.10035780e+00  5.40773286e-01
 -5.23651238e-01  5.25538532e-01 -5.21748144e-01 -5.70826370e-01
  1.21903678e+00  5.45502006e-02 -6.04703699e-02  3.60616908e-02
  1.65534538e-01  2.15688587e-01 -4.97538018e-02  5.60827621e-01
 -3.84989316e+00  4.09290318e-01 -8.74945315e-02 -3.56383019e-02
  2.64558270e-03  1.28401105e+00  2.24424241e+00  0.00000000e+00
  0.00000000e+00  0.00000000e+00  0.00000000e+00  0.00000000e+00
  0.00000000e+00  1.00000000e+00  7.16574164e-02  3.75826113e-02
 -3.02589527e-01  1.00000000e+00  1.00000000e+00  0.00000000e+00
  0.00000000e+00  0.00000000e+00  0.00000000e+00  0.00000000e+00
  0.00000000e+00  0.00000000e+00  0.00000000e+00  0.00000000e+00
  0.00000000e+00  0.00000000e+00  0.00000000e+00  0.00000000e+00
  0.00000000e+00  0.00000000e+00  0.00000000e+00  0.00000000e+00
  0.00000000e+00  0.00000000e+00  0.00000000e+00  0.00000000e+00
  0.00000000e+00  0.00000000e+00  0.00000000e+00  0.00000000e+00
  0.00000000e+00  0.00000000e+00  0.00000000e+00  0.00000000e+00
  0.00000000e+00  0.00000000e+00  0.00000000e+00  0.00000000e+00
  0.00000000e+00  0.00000000e+00  0.00000000e+00  0.00000000e+00
  0.00000000e+00  0.00000000e+00  0.00000000e+00  0.00000000e+00
  0.00000000e+00  0.00000000e+00  0.00000000e+00  0.00000000e+00
  0.00000000e+00  0.00000000e+00  0.00000000e+00  0.00000000e+00
  0.00000000e+00  0.00000000e+00  0.00000000e+00  0.00000000e+00
  0.00000000e+00  0.00000000e+00  0.00000000e+00  0.00000000e+00
  0.00000000e+00  0.00000000e+00  0.00000000e+00  0.00000000e+00
  0.00000000e+00  0.00000000e+00  0.00000000e+00  0.00000000e+00
  0.00000000e+00  0.00000000e+00  0.00000000e+00  0.00000000e+00
  0.00000000e+00  0.00000000e+00  0.00000000e+00]```

@anyboby
Copy link

anyboby commented Oct 9, 2020

Hi,
it's a rather old thread, but in case anyone else is wondering about the zero terms:
This issue originates from the combination of mujoco-py >= 2.0 and mujoco 200 (also see this thread: #1541) where contact forces are not necessarily calculated by mujoco.
Solutions would include downgrading either mujoco-py to 1.5x or mujoco to 150.
Overwriting the ant environment to manually force the calculation is also a solution:

    def step(self, a): \
        xposbefore = self.get_body_com("torso")[0]
        self.do_simulation(a, self.frame_skip)
        #######
        mjp.functions.mj_rnePostConstraint(self.sim.model, self.sim.data) #### calc contacts, this is a mujoco py version mismatch issue with mujoco200
        #######
        xposafter = self.get_body_com("torso")[0]
        forward_reward = (xposafter - xposbefore)/self.dt
        ctrl_cost = .5 * np.square(a).sum()
        contact_cost = 0.5 * 1e-3 * np.sum(
            np.square(np.clip(self.sim.data.cfrc_ext, -1, 1)))
        survive_reward = 1.0
        reward = forward_reward - ctrl_cost - contact_cost + survive_reward
        state = self.state_vector()
        notdone = np.isfinite(state).all() \
            and state[2] >= 0.2 and state[2] <= 1.0
        done = not notdone
        ob = self._get_obs()
        return ob, reward, done, dict(
            reward_forward=forward_reward,
            reward_ctrl=-ctrl_cost,
            reward_contact=-contact_cost,
            reward_survive=survive_reward)

PS: I think this is actually a rather important issue, as contact forces are part of the reward function and should remain unchanged for combinations of tested mjpy - mujoco versions.

@DanielTakeshi
Copy link

Agree with @anyboby not sure if this should be closed.

@johnnylin110
Copy link

Thanks @anyboby , your comment really help ,
I also want to ask is your second method(modify the code) identically equal to your another solution downgrad mujoco_py to 1.5X?
because I use your second method on some experiment and want to make sure it is identically the same and compare it with other paper result.
Thanks !

@anyboby
Copy link

anyboby commented Jan 5, 2021

@johnnylin110
Generally, overwriting the environments is not equal to downgrading mujoco or mujoco-py since you're not undoing any of the changes associated with the different versions. I also can't speak for the dynamics involved in the mujoco backend, since I don't know if, for example, the dynamics solvers operate in the exact same way between versions, or if a call to compute contact forces from mujoco-py is slower than an internal call etc. etc..

As for the reward functions, given equal states, this code produced the same rewards as mujoco150 + mjpy 200 for me. But again, there is no guarantee and for absolute certainty you would have to use the same versions as a referenced paper.

@johnnylin110
Copy link

@anyboby
Thanks for your reply !
I will take this into consideration
Very appreciated!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants