Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Overcooked and OffPolicyAgent #12

Open
ConstantinRuhdorfer opened this issue Apr 26, 2023 · 2 comments
Open

Overcooked and OffPolicyAgent #12

ConstantinRuhdorfer opened this issue Apr 26, 2023 · 2 comments

Comments

@ConstantinRuhdorfer
Copy link

Hi,

I adapted the simple example to use

import gym
from overcookedgym.overcooked_utils import LAYOUT_LIST
from pantheonrl.common.agents import OnPolicyAgent, OffPolicyAgent
from stable_baselines3 import PPO, DQN

layout = "simple"
assert layout in LAYOUT_LIST
print(f"Using layout: {layout} from {LAYOUT_LIST}")

env = gym.make("OvercookedMultiEnv-v0", layout_name=layout)

partner = OffPolicyAgent(DQN("MlpPolicy", env, verbose=1))
env.add_partner_agent(partner)

ego = DQN("MlpPolicy", env, verbose=1)
ego.learn(total_timesteps=1000)

Just to test OffPolicyAgent but I keep getting:

Traceback (most recent call last):
  File "/projects/ruhdorfer/msc2023_constantin/src/scripts/train_simple_overcooked.py", line 31, in <module>
    ego.learn(total_timesteps=1000)
  File "/projects/ruhdorfer/msc2023_constantin/venv/lib/python3.10/site-packages/stable_baselines3/dqn/dqn.py", line 269, in learn
    return super().learn(
  File "/projects/ruhdorfer/msc2023_constantin/venv/lib/python3.10/site-packages/stable_baselines3/common/off_policy_algorithm.py", line 311, in learn
    rollout = self.collect_rollouts(
  File "/projects/ruhdorfer/msc2023_constantin/venv/lib/python3.10/site-packages/stable_baselines3/common/off_policy_algorithm.py", line 543, in collect_rollouts
    new_obs, rewards, dones, infos = env.step(actions)
  File "/projects/ruhdorfer/msc2023_constantin/venv/lib/python3.10/site-packages/stable_baselines3/common/vec_env/base_vec_env.py", line 163, in step
    return self.step_wait()
  File "/projects/ruhdorfer/msc2023_constantin/venv/lib/python3.10/site-packages/stable_baselines3/common/vec_env/dummy_vec_env.py", line 54, in step_wait
    obs, self.buf_rews[env_idx], self.buf_dones[env_idx], self.buf_infos[env_idx] = self.envs[env_idx].step(
  File "/projects/ruhdorfer/msc2023_constantin/venv/lib/python3.10/site-packages/stable_baselines3/common/monitor.py", line 95, in step
    observation, reward, done, info = self.env.step(action)
  File "/projects/ruhdorfer/msc2023_constantin/venv/lib/python3.10/site-packages/gym/wrappers/order_enforcing.py", line 11, in step
    observation, reward, done, info = self.env.step(action)
  File "/projects/ruhdorfer/PantheonRL/pantheonrl/common/multiagentenv.py", line 195, in step
    acts = self._get_actions(self._players, self._obs, action)
  File "/projects/ruhdorfer/PantheonRL/pantheonrl/common/multiagentenv.py", line 157, in _get_actions
    actions.append(agent.get_action(ob))
  File "/projects/ruhdorfer/PantheonRL/pantheonrl/common/agents.py", line 263, in get_action
    self.model._store_transition(
  File "/projects/ruhdorfer/msc2023_constantin/venv/lib/python3.10/site-packages/stable_baselines3/common/off_policy_algorithm.py", line 455, in _store_transition
    for i, done in enumerate(dones):
TypeError: 'bool' object is not iterable

This seems to be due to the fact that SB3 is expecting multiple dones from env.step in stable_baselines3/common/off_policy_algorithm.py:544: new_obs, rewards, dones, infos = env.step(actions) where Overcooked only returns a single done in overcookedgym/overcooked.py:80.

Are off policy algorithms not supported? Is there a good way of fixing this, i.e. by changing line 80 from

return (ego_obs, alt_obs), (reward, reward), done, {}#info

to

return (ego_obs, alt_obs), (reward, reward), [done], {}#info

?

Thank you!

Cheers, Constantin

@ConstantinRuhdorfer
Copy link
Author

Hi, I can confirm that simply changing line 80 in multi_step in overcookedgym/overcooked.py from:

return (ego_obs, alt_obs), (reward, reward), done, {}#info

to this

return (ego_obs, alt_obs), (reward, reward), [done], {}#info

fixes the issue and still works with OnPolicyAgent and PPO. I will open up a PR, can you maybe comment if this has any other implications? Thanks

@ConstantinRuhdorfer
Copy link
Author

PR is here #14

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant