Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RLLIB QMIX example does not work #12

Open
josjo80 opened this issue Jul 5, 2019 · 6 comments
Open

RLLIB QMIX example does not work #12

josjo80 opened this issue Jul 5, 2019 · 6 comments

Comments

@josjo80
Copy link

josjo80 commented Jul 5, 2019

There appears to be a problem when using a masked action space with the QMIX algorithm. I think the qmix_policy_graph expects there to be at least one valid action at all times.

Full traceback is below:

  File "/home/johnson/miniconda3/envs/rlenv/lib/python3.6/site-packages/ray/tune/trial_runner.py", line 446, in _process_trial
    result = self.trial_executor.fetch_result(trial)
  File "/home/johnson/miniconda3/envs/rlenv/lib/python3.6/site-packages/ray/tune/ray_trial_executor.py", line 316, in fetch_result
    result = ray.get(trial_future[0])
  File "/home/johnson/miniconda3/envs/rlenv/lib/python3.6/site-packages/ray/worker.py", line 2197, in get
    raise value
ray.exceptions.RayTaskError: ^[[36mray_QMixTrainer:train()^[[39m (pid=25398, host=cassini)
  File "/home/johnson/miniconda3/envs/rlenv/lib/python3.6/site-packages/ray/rllib/agents/trainer.py", line 354, in train
    raise e
  File "/home/johnson/miniconda3/envs/rlenv/lib/python3.6/site-packages/ray/rllib/agents/trainer.py", line 340, in train
    result = Trainable.train(self)
  File "/home/johnson/miniconda3/envs/rlenv/lib/python3.6/site-packages/ray/tune/trainable.py", line 151, in train
    result = self._train()
  File "/home/johnson/miniconda3/envs/rlenv/lib/python3.6/site-packages/ray/rllib/agents/dqn/dqn.py", line 242, in _train
    self.optimizer.step()
  File "/home/johnson/miniconda3/envs/rlenv/lib/python3.6/site-packages/ray/rllib/optimizers/sync_batch_replay_optimizer.py", line 84, in step
    return self._optimize()
  File "/home/johnson/miniconda3/envs/rlenv/lib/python3.6/site-packages/ray/rllib/optimizers/sync_batch_replay_optimizer.py", line 108, in _optimize
    info_dict = self.local_evaluator.learn_on_batch(samples)
  File "/home/johnson/miniconda3/envs/rlenv/lib/python3.6/site-packages/ray/rllib/evaluation/policy_evaluator.py", line 581, in learn_on_batch
    info_out[pid] = policy.learn_on_batch(batch)
  File "/home/johnson/miniconda3/envs/rlenv/lib/python3.6/site-packages/ray/rllib/agents/qmix/qmix_policy_graph.py", line 296, in learn_on_batch
    next_obs, action_mask, next_action_mask)
  File "/home/johnson/miniconda3/envs/rlenv/lib/python3.6/site-packages/torch/nn/modules/module.py", line 493, in __call__
    result = self.forward(*input, **kwargs)
  File "/home/johnson/miniconda3/envs/rlenv/lib/python3.6/site-packages/ray/rllib/agents/qmix/qmix_policy_graph.py", line 108, in forward
    there may be a state with no valid actions."
AssertionError: target_max_qvals contains a masked action;             there may be a state with no valid actions.```
@richardliaw
Copy link
Contributor

cc @ericl

@EC2EZ4RD
Copy link

EC2EZ4RD commented Nov 6, 2019

I found this problem mainly from enabling self.double.q, if you set self.double.q=False in the default config, then qmix can run.

@plutonic88
Copy link

I was able to run the RLlib's QMIX in the Starcraft2 env. However, the policy does not converge.
Any suggestion is appreciated.

@EC2EZ4RD
Copy link

I forgot how to make it converge. I recommend you to use pymarl instead of rllib if you want to explore some research ideas.

@xiaoToby
Copy link

I found this problem mainly from enabling self.double.q, if you set self.double.q=False in the default config, then qmix can run.

Hi, where to set self.double.q=False
And when I run this example , the error log is below:
(RolloutWorker pid=44372) ray::RolloutWorker.init() (pid=44372, ip=127.0.0.1, repr=<ray.rllib.evaluation.rollout_worker.RolloutWorker object at 0x000002577D997BB0>)
(RolloutWorker pid=44372) File "python\ray_raylet.pyx", line 658, in ray._raylet.execute_task
(RolloutWorker pid=44372) File "python\ray_raylet.pyx", line 699, in ray._raylet.execute_task
(RolloutWorker pid=44372) File "python\ray_raylet.pyx", line 665, in ray._raylet.execute_task
(RolloutWorker pid=44372) File "python\ray_raylet.pyx", line 669, in ray._raylet.execute_task
(RolloutWorker pid=44372) File "python\ray_raylet.pyx", line 616, in ray._raylet.execute_task.function_executor
(RolloutWorker pid=44372) File "C:\conda\envs\smac\lib\site-packages\ray_private\function_manager.py", line 675, in actor_method_executor
(RolloutWorker pid=44372) return method(__ray_actor, *args, **kwargs)
(RolloutWorker pid=44372) File "C:\conda\envs\smac\lib\site-packages\ray\util\tracing\tracing_helper.py", line 462, in _resume_span
(RolloutWorker pid=44372) return method(self, *_args, **_kwargs)
(RolloutWorker pid=44372) File "C:\conda\envs\smac\lib\site-packages\ray\rllib\evaluation\rollout_worker.py", line 511, in init
(RolloutWorker pid=44372) check_env(self.env)
(RolloutWorker pid=44372) File "C:\conda\envs\smac\lib\site-packages\ray\rllib\utils\pre_checks\env.py", line 78, in check_env
(RolloutWorker pid=44372) raise ValueError(
(RolloutWorker pid=44372) ValueError: Traceback (most recent call last):
(RolloutWorker pid=44372) File "C:\conda\envs\smac\lib\site-packages\ray\rllib\utils\pre_checks\env.py", line 65, in check_env
(RolloutWorker pid=44372) check_multiagent_environments(env)
(RolloutWorker pid=44372) File "C:\conda\envs\smac\lib\site-packages\ray\rllib\utils\pre_checks\env.py", line 268, in check_multiagent_environments
(RolloutWorker pid=44372) next_obs, reward, done, info = env.step(sampled_action)
(RolloutWorker pid=44372) File "C:\conda\envs\smac\lib\site-packages\ray\rllib\env\wrappers\group_agents_wrapper.py", line 76, in step
(RolloutWorker pid=44372) obs, rewards, dones, infos = self.env.step(action_dict)
(RolloutWorker pid=44372) File "C:\conda\envs\smac\smac\smac\examples\rllib\env.py", line 82, in step
(RolloutWorker pid=44372) raise ValueError(
(RolloutWorker pid=44372) ValueError: You must supply an action for agent: 0

How to fix it , and run well? Thanks
@EC2EZ4RD

@MichaelXCChen
Copy link

@xiaoToby One key difference between the default PyMARL implementation and rllib implementation of QMIX for SMAC is that PyMARL uses the true overall global state but rllib only uses the per-agent observation as the global state in the monotonic mixing network. So you'd need to modify the default implementation and extract the true global state from the environment so that the mixing network can use it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants